Copyright

Library of Congress Cataloging-in-Publication Data

Das, Abhijit.
     Public-key cryptography : theory and practice / Abhijit Das, C. E. Veni Madhavan.
               p. cm.
     Includes bibliographical references and index.
     ISBN: 978-8131708323 (pbk.)
  1. Public key cryptography. 2. Telecommunication—Security
measures-Mathematics. 3. Computers-Access control-Mathematics. I. Madhavan,
C. E. Veni. II. Title.     TK5102.94.D37 2009
     005.8'2-dc22
                                                                           2009012766

Copyright © 2009 Dorling Kindersley (India) Pvt. Ltd.

Licensees of Pearson Education in South Asia

This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, resold, hired out, or otherwise circulated without the publisher’s prior written consent in any form of binding or cover other than that in which it is published and without a similar condition including this condition being imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior written permission of both the copyright owner and the above-mentioned publisher of this book.

ISBN 9788131708323

Head Office: 482 FIE, Patparganj, Delhi 110 092, India

Registered Office: 14 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India

Printed in India.

Pearson Education Inc., Upper Saddle River, NJ
Pearson Education Ltd., London
Pearson Education Australia Pty, Limited, Sydney
Pearson Education Singapore, Pte. Ltd
Pearson Education North Asia Ltd, Hong Kong
Pearson Education Canada, Ltd., Toronto
Pearson Educacion de Mexico, S.A. de C.V.
Pearson Education-Japan, Tokyo
Pearson Education Malaysia, Pte. Ltd.

Preface

I can’t understand why a person will take a year to write a novel when he can easily buy one for a few dollars.

—Fred Allen

The first moral question that we faced (like most authors) is: “Why another book?” Available textbooks on public-key cryptography (or cryptography in general) are many [37, 74, 113, 114, 145, 152, 153, 194, 209, 262, 283, 288, 291, 296]. In the presence of all these books, writing another may sound like a waste of energy and effort.

Fortunately, we have a big answer. Most cryptography textbooks today, even many of the celebrated ones, essentially take a narrative approach. While such an approach may be suitable for beginners at an undergraduate level, it misses the finer details in this rapidly growing area of applied mathematics. The fact that public-key cryptography is mathematical is hard to deny and a mathematical subject would be better treated in the mathematical way.

This is precisely the point that this book addresses, that is, it proceeds in a canonically mathematical way while revealing cryptographic concepts. This mathematics is often not so simple (and that is why other textbooks didn’t bother to mention it), but we plan to stick to mathematical sophistication as far as possible. A typical feature of this book is that it does not rely on anything other than the readers’ mathematical intuitions; it develops all the mathematical abstractions starting from scratch. Although computer science and mathematics students nowadays do undergo some courses on discrete structures somewhere in their curricula, we do not assume this; instead we develop the algebra starting at the level of set operations. Simpler structures like groups, rings and fields are followed by more complex concepts like finite fields, algebraic curves, number fields and p-adic numbers. The resulting (long) compilation of abstract mathematical tools tends to relieve cryptography students and researchers from consulting many mathematics books for understanding the background concepts. We are happy to offer this self-sufficient treatment complete with proofs and other details. The only place where we had to be somewhat sketchy is the discussion on elliptic and hyperelliptic curves. The mathematics here seems to be too vast to fit in a few pages and we opted for a deliberate simplification of these topics.

A big problem with discrete mathematics is that many of its proofs are existential. However, in order to make things work in a practical environment one must undergo algorithmic studies of algebra and number theory. This is what our book does next. While many algorithmic issues in this area are settled favourably, there remain some problems whose best known algorithmic complexities are still poor. Some of these so-called computationally difficult problems are used to build secure public-key cryptosystems. The security of these systems are assumed (rather than proven) and so we extensively deal with the algorithms known till date to solve these difficult problems. This is precisely the point that utilizes the mathematics developed in earlier chapters, to a great extent.

In Chapter 5, we eventually hit upon the culmination of all these mathematical and algorithmic studies in the design of public-key systems for achieving various cryptographic goals. Under the theoretical base developed in earlier chapters, Chapter 5 turns out to be an easy chapter. This is our way of looking into the problem, namely, a formal bottom–up approach. We claim to be different from most textbooks in this regard. Our discussion of mathematics is not for its own sake, but to develop the foundation of cryptographic primitives.

We then turn to some purely implementation and practical issues of public-key cryptography. Standards proposed by organizations such as IEEE and RSA Security Inc. promote interoperability of using crypto primitives in Internet applications. We then look at some small applications of the crypto basics. Some indirect ways of cryptanalysis are described next. These techniques (side-channel and backdoor attacks) give the book a strong practical flavour in tandem with its otherwise formal appearance.

As an eleventh-hour decision, we added a final chapter to our book, a chapter on quantum computation and its implications on public-key cryptography. Although somewhat theoretical at this point, quantum computation exhibits important ramifications in public-key cryptography. The mathematics behind quantum mechanics and computation are never discussed earlier just to highlight the distinctive nature of this chapter, which may perhaps be titled cryptography in future.

This schematic description of this book perhaps makes it clear that this book is better suited as a graduate-level textbook. A one- or two-semester graduate or advanced undergraduate course can run based on the contents of this book. Self-studying this book is also possible at an advanced graduate or research level, but is expected to be difficult at an undergraduate level. We highlight the importance of classroom teaching, if an undergraduate course is to be based on this textbook.

We rated different items in the book by their levels of difficulty and/or mathematical sophistication. Unstarred items can be covered even in undergraduate courses. Items marked by single stars can be taken seriously for a second course or a second reading. Doubly starred items, on the other hand, are research-level materials and can be pursued only in really advanced courses or for undergoing research. Inclusion of a good amount of these advanced topics marks another distinction of this book compared to other available textbooks.

The book comes with plenty of exercises. We have two-fold motivations behind these exercises. In the first place, they help the readers deepen their understanding of the matter discussed in the text. In the second place, some of these exercises build additional theory that we omit in the text proper. We occasionally make use of these additional topics in proving and/or explaining results in the text. We do not classify the exercises into easy and difficult ones, but specify hints, some of which are pretty explicit, for intellectually challenging parts. We separate out the hints in an appendix near the end of this book and leave the marker [H] in appropriate locations of the statements of the exercises. This practice prevents a reader from accidentally seeing a hint. Only when the reader gets stuck, (s)he can look at the hints at the end. We believe that the exercises, together with our discussion on algorithms and implementation issues, will offer serious students many ways to carry out substantial implementation work to further their research and development in cryptography.

Every chapter ends with annotated references for further studies. We do not claim to be encyclopaedic in this respect. Instead we mention only those references that, we feel, are directly related to the topics dealt with in the respective chapters.

As a trade-off between bulk and coverage, we had to leave many issues untouched. For example, we were limited by constraints of space to present symmetric-key cryptography in detail. However, in view of its importance today, we include brief discussions in an appendix on block ciphers, stream ciphers and hash functions. We also do not discuss anything about formal security of public-key protocols. The issues related to provable security are at the minimum theoretically important in the study of cryptography, but are entirely left out here. Only a brief discussion on the implication of complexity theory on the security of public-key protocols is included in another appendix. The Handbook of Applied Cryptography [194] by Menezes et al. can supplement this book for learning symmetric techniques, whereas the book by Delfs and Knebl [74] or those by Goldreich [113, 114] can be consulted for formal security issues.

We are indebted to everybody whose criticism, encouragement and support made this project materializable. Special thanks go to Bimal Roy, Chandan Mazumdar, C. Pandurangan, Debdeep Mukhopadhyay, Dipanwita Roychowdhury, Gagan Garg, Hartmut Wiebe, H. V. Kumar Swamy, Indranil Sengupta, Kapil Paranjape, Manindra Agarwal, Palash Sarkar, Rajesh Pillai, Rana Barua, R. Balasubramanian, Sanjay Barman, Shailesh, Satrajit Ghosh, Souvik Bhattacherjee, Srihari Vavilapalli, Subhamoy Maitra, Surjyakanta Mohapatro, and Uwe Storch. This book has been tested in postgraduate courses in the Indian Institute of Science, Bangalore, and in the Indian Institute of Technology Kharagpur. We sincerely thank all our students for pointing out many errors and suggesting several improvements. We express our deep gratitude to our family members for their constant understanding and moral support. We are also indebted to our institutes for providing the wonderful intellectual climate for completing this work.

A. D.

C. E. V. M.

Notations

Any time you are stuck on a problem, introduce more notation.

—Chris Skinner [Plenary Lecture, Aug 1997, Topics in Number Theory, Penn State]

General
|a|absolute value of real number a
min Sminimum of elements of set S
max Smaximum of elements of set S
exp(a)ea, where
log xlogarithm of x with respect to some unspecified base (like 10)
ln xloge x, where
lg xlog2 x
logk x(log x)k (similarly, lnk x = (ln x)k and lgk x = (lg x)k)
:=is defined as (or “is assigned the value” in code snippets)
i
complex conjugate (x – iy) of the complex number z = x + iy
δijKronecker delta
(asas–1 . . . a0)bb-ary representation of a non-negative integer
binomial coefficient, equals n(n – 1) ··· (nr + 1)/r!
xfloor of real number x
xceiling of real number x
[a, b]closed interval, that is, the set of real numbers x in the range axb
(a, b)open interval, that is, the set of real numbers x in the range a < x < b
L(t, α, c)expression of the form exp ((c + o(1))(ln t)α(ln ln t)1–α)
Lt[c]abbreviation for L(t, 1/2, c) (denoted also as L[c] if t is understood)
Bit-wise operations (on bit strings a, b)
NANDnegation of AND
NORnegation of OR
XORexclusive OR
abbit-wise exclusive OR (XOR) of a and b
a AND bbit-wise AND of a and b
a OR bbit-wise inclusive OR of a and b
LSk(a)left shift of a by k bits
RSk(a)right shift of a by k bits
LRk(a)left rotate (cyclic left shift) of a by k bits
RRk(a)right rotate (cyclic right shift) of a by k bits
ābit-wise complement of a
abconcatenation of a and b
Sets
empty set
#Acardinality of set A
a is an element of set A
ABset A is contained in set B
ABset A is not contained in set B
set A is properly contained in set B
ABunion of sets A and B
ABdisjoint union of sets A and B
ABintersection of sets A and B
A \ Bdifference of sets A and B
Ācomplement of set A (in a bigger set)
A × B(Cartesian) product of sets A and B
set of all natural numbers, that is, {1, 2, 3, . . .}
set of all non-negative integers, that is, {0, 1, 2, . . .}
set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .}
set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .}
set of all rational numbers, that is,
set of all non-zero rational numbers
set of all real numbers
set of all non-zero real numbers
set of all non-negative real numbers
set of all complex numbers
set of all non-zero complex numbers
, can be represented by the set {0, 1, . . . , n –1}
group of units in , can be represented as {a | 0 ≤ a < n, gcd(a, n) = 1}
finite field of cardinality q
multiplicative group of , that is,
ring of integers of number field K
group of units of
ring of p-adic integers
field of p-adic numbers
Upgroup of units of
Functions and relations
f : ABf is a function from set A to set B
f : ABf is an injective function from set A to set B
f : ABf is a surjective function from set A to set B
aba is mapped to b (by a function)
f ο gcomposition of functions f and g (applied from right to left)
f–1inverse of bijective function f
Ker fkernel of function (homomorphism) f
Im fimage of function f
~equivalent to
[a]equivalence class of a
Groups
aHcoset in a multiplicative group
a + Hcoset in an additive group
HKinternal direct product of (sub)groups H and K
H × Kexternal direct product of (sub)groups H and K
[G : H]index of subgroup H in group G
G/Hquotient group
G1G2groups G1 and G2 are isomorphic
ord Gorder (that is, cardinality) of group G
ordG aorder of element a in group G
Exp Gexponent of group G
Z(G)centre of group G
C(a)centralizer of group element a
GLn(K)general linear group over field K (of n × n matrices)
SLn(K)special linear group over field K (of n × n matrices)
Gtorstorsion subgroup of G
Rings
char Acharacteristic of ring A
A × Bdirect product of rings A and B
A*multiplicative group of units of ring A
Sfor ring A, ideal generated by SA, also written as
afor ring A, principal ideal generated by , also written as aA and Aa
ab (mod )a is congruent to b modulo ideal , that is,
ABrings A and B are isomorphic
quotient ring (modulo ideal )
a|ba divides b (in some ring)
vp(a)multiplicity of prime p in element a
pkak = vp(a)
nilradical of ring A
Aredreduction of ring A, equals
gcd(a, b)greatest common divisor of elements a and b
lcm(a, b)least common multiple of elements a and b
sum of ideals and
intersection of ideals and
product of ideals and
root (or radical) of ideal
Q(A)total quotient ring of ring A (quotient field of A, if A is an integral domain)
S–1Alocalization of ring A at multiplicative set S
localization of ring A at prime ideal
ring of integers of number field K
N()norm of ideal (in a Dedekind domain)
CRTChinese remainder theorem
EDEuclidean domain
DDDedekind domain
DVD (or DVR)discrete valuation domain (or ring)
PIDprincipal ideal domain
UFDunique factorization domain
Fields
char Kcharacteristic of field K
K*multiplicative group of units of field K, that is, K \ {0}
algebraic closure of field K
[K : F]degree of the field extension FK
K[a]
K(a){f(a)/g(a) | f(X), , g(a) ≠ 0}
Aut Kgroup of automorphisms of field K
AutF Kfor field extension FK, group of F-automorphisms of K (also Gal(K|F))
FixF Hfor field extension FK, fixed field of subgroup H of AutF K
finite field of cardinality q
multiplicative group of units of , that is,
Trtrace function
TrK|F (a)for field extension FK, trace of over F
Nnorm function
NK|F (a)for field extension FK, norm of over F
Frobenius automorphism , aaq
ring of integers of number field K
group of units of
ΔKdiscriminant of number field K
ring of p-adic integers
field of p-adic numbers
Upgroup of units of
| |pp-adic norm on
Integers
a quot bquotient of Euclidean division of a by b ≠ 0
a rem bremainder of Euclidean division of a by b ≠ 0
a|ba divides b in , that is, b = ca for some
vp(a)multiplicity of prime p in non-zero integer a
gcd(a, b)greatest common divisor of integers a and b (not both zero)
lcm(a, b)least common multiple of integers a and b
ab (mod n)a is congruent to b modulo n
a–1 (mod n)multiplicative inverse of a modulo n (given that gcd(a, n) = 1)
φ(n)Euler’s totient function
Legendre (or Jacobi) symbol
[a]ncoset
ordn amultiplicative order of a modulo n (given that gcd(a, n) = 1)
μ(n)Möbius function
π(x)number of primes between 1 and positive real number x
Li(x)Gauss’ Li function
ψ(x, y)fraction of positive integers ≤ x, that are y-smooth
ζ(s)Riemann zeta function
RHRiemann hypothesis
ERHextended Riemann hypothesis
Mn2n – 1 (Mersenne number)
232, standard radix for representation of multiple-precision integers
Polynomials
A[X1, . . . , Xn]polynomial ring in indeterminates X1, . . . , Xn over ring A
A(X1, . . . , Xn)ring of rational functions in indeterminates X1, . . . , Xn over ring A
deg fdegree of polynomial f
lc fleading coefficient of polynomial f
minpolyα,K(X)minimal polynomial of α over field K, belongs to K[X]
cont fcontent of polynomial f
pp fprimitive part of polynomial f
f′(X)formal derivative of polynomial f(X)
Δ(f)discriminant of polynomial f
the polynomial
μmgroup of m-th roots of unity
Фmm-th cyclotomic polynomial
Vector spaces, modules and matrices
dimK Vdimension of vector space V over field K
Span Sspan of subset S of a vector space
HomK(V, W)set of all K-linear transformations VW
EndK(V)set of all K-linear transformations VV
M/Nquotient vector space or module
MNvector spaces or modules M and N are isomorphic
direct product of modules Mi,
direct sum of modules Mi,
Attranspose of matrix (or vector) A
A–1inverse of matrix A
Rank Trank of matrix or linear transformation T
RankA Mrank of A-module M
Null Tnullity of matrix or linear transformation T
(M : N)for A-module M and submodule N, the ideal of A
AnnA(M)annihilator of A-module M, same as (M : 0)
Tors Mtorsion submodule of M
A[S]A-algebra generated by set S
v, winner product of two real vectors v and w
Algebraic curves
n-dimensional affine space over field K
n-dimensional projective space over field K
(x1, . . . , xn)homogeneous coordinates of a point in
[x0, x1, . . . , xn]projective coordinates of a point in
f(h)homogenization of polynomial f
C(K)set of K-rational points over curve C defined over field K
K[C]ring of polynomial functions on curve C defined over K
K(C)field of rational functions on curve C defined over K
[P]point P on a curve in formal sums
ordP (r)order of rational function r at point P
DivK (C)group of divisors on curve C defined over field K
group of divisors of degree 0 on curve C defined over field K
DivK(r)divisor of a rational function r
PrinK(C)group of principal divisors on curve C defined over field K
Jacobian of curve C defined over field K
PicK(C)Picard group of curve C (equals DivK(C)/ PrinK(C))
, same as Jacobian
point at infinity on an elliptic or a hyperelliptic curve
Δ(E)discriminant of elliptic curve E
j(E)j-invariant of elliptic curve E
E(K)group of points on elliptic curve E defined over field K
P + Qsum of two points P,
mPm-th multiple (that is, m-fold sum) of point
ψm, , fmm-th division polynomials
ttrace of Frobenius of elliptic curve
EK[m]group of m-torsion points in E(K)
E[m]abbreviation for
emWeil pairing (a map E[m] × E[m] → μm)
Div(a, b)representation of reduced divisor on hyperelliptic curve by polynomials a, b
Probability and statistics
Pr(E)probability of event E
Pr(E1|E2)conditional probability of event E1 given event E2
E(X)expectation of random variable X
Var(X)variance of random variable X
σXstandard deviation of random variable X (equals )
Cov(X, Y)covariance of random variables X, Y
ρX,Ycorrelation coefficient of random variables X, Y
Computational complexity
f = O(g)big-Oh notation: f is of the order of g
f = Ω(g)big-Omega notation: g is of the order of f
f = Θ(g)big-Theta notation: f and g have the same order
f = o(g)small-oh notation: f is of strictly smaller order than g
f = ω(g)small-omega notation: f is of strictly larger order than g
f = O~(g)soft-Oh notation: f = O(g logk g) for real constant k ≥ 0
problem P1 is polynomial-time reducible to problem P2
P1P2problems P1 and P2 are polynomial-time equivalent
Intractable problems
CVPclosest vector problem
DHP(finite field) Diffie–Hellman problem
DLP(finite field) discrete logarithm problem
ECDHPelliptic curve Diffie–Hellman problem
ECDLPelliptic curve discrete logarithm problem
HECDHPhyperelliptic curve Diffie–Hellman problem
HECDLPhyperelliptic curve discrete logarithm problem
GIFPgeneral integer factorization problem
IFPinteger factorization problem
QRPquadratic residuosity problem
RSAIFPRSA integer factorization problem
RSAKIPRSA key inversion problem
RSAPRSA problem
SQRTPmodular square root problem
SSPsubset sum problem
SVPshortest vector problem
Algorithms
ADHAdleman, DeMarrais and Huang’s algorithm
AESadvanced encryption standard
AKSAgarwal, Kayal and Saxena’s deterministic primality test
BSGSShanks’ baby-step–giant-step method
CBCcipher-block chaining mode
CFBcipher feedback mode
CSMcubic sieve method
CSPRBGcryptographically strong pseudorandom bit generator
CvAChaum and Van Antwerpen’s undeniable signature scheme
DDFdistinct-degree factorization
DESdata encryption standard
DHDiffie–Hellman key exchange
DPAdifferential power analysis
DSAdigital signature algorithm
DSSdigital signature standard
ECBelectronic codebook mode
ECDSAelliptic curve digital signature algorithm
ECMelliptic curve method
E-D-Eencryption–decryption–encryption scheme of triple encryption
EDFequal-degree factorization
EGEschenauer and Gligor’s scheme
FEALfast data encipherment algorithm
FFSFeige, Fiat and Shamir’s zero-knowledge protocol
GKRGennaro, Krawczyk and Rabin’s RSA-based undeniable signature scheme
GNFSMgeneral number field sieve method
GQGuillou and Quisquater’s zero-knowledge protocol
HFEcryptosystem based on hidden field equations
ICMindex calculus method
IDEAinternational data encryption algorithm
KLCHKPbraid group cryptosystem
L3Lenstra–Lenstra–Lovasz algorithm
LFSRlinear feedback shift register
LSMlinear sieve method
LUCcryptosystem based on Lucas sequences
MOVMenezes, Okamoto and Vanstone’s reduction
MPQSMmultiple polynomial quadratic sieve method
MQVMenezes–Qu–Vanstone key exchange
NFSMnumber field sieve method
NRNyberg–Rueppel signature algorithm
NTRUHoffstein, Pipher and Silverman’s encryption algorithm
NTRUSignNTRU signature algorithm
OAEPoptimal asymmetric encryption procedure
OFBoutput feedback mode
PAPpretty awful privacy
PGPpretty good privacy
PHPohlig–Hellman method
PRBGpseudorandom bit generator
PSSprobabilistic signature scheme
QSMquadratic sieve method
RSARivest, Shamir and Adleman’s algorithm
SAFERsecure and fast encryption routine
Satoh–FGHPoint counting algorithm on elliptic curves over fields of characteristic 2
SDSAshortened digital signature algorithm
SEASchoof, Elkies and Atkins’ algorithm for point counting on elliptic curves
SETUPsecretly embedded trapdoor with universal protection
SFFsquare-free factorization
SHAsecure hash algorithm
SmartASSalgorithm for computing discrete logs in anomalous elliptic curves
SNFSMspecial number field sieve method
SPAsimple power analysis
TWINKLEthe Weizmann Institute key location engine
TWIRLthe Weizmann Institute relation locator
XCMxedni calculus method
XSLextended sparse linearization attack
XTRefficient and compact subgroup trace representation
ZKzero-knowledge
Quantum computation
|ψ〉ket notation for vector ψ
inner product of vectors |ψ〉 and
‖ψ‖norm of vector |ψ〉 (equals )
n-dimensional Hilbert space (over )
|0〉, |1〉, . . . , |n – 1〉orthonormal basis of
cbitclassical bit
qubitquantum bit
tensor product of Hilbert spaces
FFourier transform
HHadamard transform
IIdentity transform
XExchange transform
ZZ transform
Computational primitives
ulong32-bit unsigned integer data type (unsigned long)
ullong64-bit unsigned integer data type (unsigned long long)
a := bassignment operator (returns the value assigned)
+, –, ×, /, %arithmetic operators
++, – –increment and decrement operators
a ◊ = ba := ab for
=, ≠, >, <, ≥, ≤comparison operators
1True as a condition
ifconditional statement: if (condition)···
if-elseconditional statement: if (condition)··· , else···
whilewhile loop: while (condition)···
dodo loop: do···while (condition)
forfor loop: for (range of values)···
{···}block of statements
, or. or new-linestatement terminator
/*··· */comment
returnreturn from this routine
Miscellaneous
end of (visible or invisible) proof
end of item (like example, definition, assumption)
[H]hint available in Appendix D

1. Overview

1.1Introduction
1.2Common Cryptographic Primitives
1.3Public-key Cryptography
1.4Some Cryptographic Terms
 Chapter Summary

Aller Anfang ist schwer: All beginnings are difficult.

—German proverb

Defendit numerus: There is safety in numbers.

—Anonymous

The ability to quote is a serviceable substitute for wit.

—W. Somerset Maugham

1.1. Introduction

It is rather difficult to give a precise definition of cryptography. Loosely speaking, it is the science (or art or technology) of preventing access to sensitive data by parties who are not authorized to access the data. Secure transmission of messages over a public channel is the first, simplest and oldest example of a cryptographic protocol. For assessing the security of these protocols, one studies their possible weak points, namely the strategies for breaking them. This study is commonly referred to as cryptanalysis. And, finally, the study of both cryptography and cryptanalysis is known as cryptology.

Cryptology = Cryptography + Cryptanalysis

The science of cryptology is rather old. It naturally developed as and when human beings felt the need for privacy and secrecy. The rapid deployment of the Internet in the current years demands that we look into this subject with a renewed interest. Newer requirements tailored to Internet applications have started cropping up and as a result newer methods, protocols and algorithms are coming up. The most startling discoveries include that of the key-exchange protocol by Diffie and Hellman in 1976 and that of the RSA cryptosystem by Rivest, Shamir and Adleman in 1978. They opened up a new branch of cryptology, namely public-key cryptology. Historically, public-key technology came earlier than the Internet, but it is the latter that makes an extensive use of the former.

This book is an attempt to introduce to the reader the vast and interesting branch of public-key cryptology. One of the most distinguishing features of public-key cryptology is that it involves a reasonable amount of abstract mathematics which often comes in the way of a complete understanding to an uninitiated reader. This book tries to bridge the gap. We develop the required mathematics in necessary and sufficient details.

This chapter is an overview of the topics that the rest of the book deals with. We start with a description of the most common cryptographic protocols. Then we introduce the public-key paradigm and discuss the source of its security. We use certain mathematical terms and notations throughout this chapter. If the reader is not already familiar with these terms, there is nothing to worry about. As we have just claimed, we will introduce the mathematics in the later chapters. The exposition of this chapter is expected to give the reader an overview of the area of public-key cryptography and also the requisite motivation for learning the mathematical tools that follow.

1.2. Common Cryptographic Primitives

As claimed at the outset of this chapter, it is rather difficult to give a precise definition of the term cryptography. The best way to understand it is by examples. In this section, we briefly describe the common problems that cryptography deals with.

1.2.1. The Classical Problem: Secure Transmission of Messages

To start with, we introduce the legendary figures of cryptography: Alice, Bob and Carol. Alice wants to send a message to Bob over a public communication channel like the Internet and wants to ensure that nobody other than Bob can make out the meaning of the message. A third party like Carol, who has access to the communication channel, can intercept the message. But the message should be wrapped or transformed before transmission in such a way that knowledge of some secret piece of information is needed to unwrap or transform back the message. It is Bob who has this information, but not Carol (nor Dorothy nor Emily nor . . .).

It is expedient to point out here that Alice, Bob and Carol need not be human beings. They can stand for organizations (like banks) or, more correctly, for computers or computer programs run by individuals or organizations. It is, therefore, customary to call them parties, entities or subjects instead of persons or characters. In the cryptology jargon, Carol has got several names used interchangeably: adversary, eavesdropper, opponent, intruder, attacker and enemy are the most common ones. When a message transmission like that just mentioned is involved, Alice is called the sender and Bob is called the receiver of the message.

It is a natural strategy to put the message in a box and lock the box using a key, called the encryption key. A matching decryption key is needed to unlock the box and retrieve the message. The process of putting the message in the box is commonly called encoding and that of locking the box is called encryption. The reverse processes, namely unlocking the box and taking the message out of the box are respectively called decryption and decoding. This is precisely the classical encryption–decryption protocol of cryptography.[1]

[1] Some people prefer to use the terms enciphering and deciphering in place of the words encryption and decryption respectively.

In the world of electronic communication, a message M is usually a bit string, and encoding, encryption, decryption and decoding are well-defined transformations of bit strings. If we denote by fe the transformation function consisting of encoding and encryption, then we get a new bit string C = fe(M, Ke), where Ke stands for the encryption key. This bit string C is sent over the communication channel. After Bob receives C, he uses the reverse transformation fd (decryption followed by decoding) to get the original message M back; that is, M = fd(C, Kd). Note that the decryption key Kd is needed as an argument to fd. If Carol does not know this, she cannot compute M. We conventionally call M the plaintext message and C the ciphertext message.

The encoding and decoding operations do not make use of keys and can be performed by anybody. (It should not be difficult to put a letter in or take a letter out of an unlocked box!) One might then wonder why it is necessary to do these transformations instead of applying the encryption and decryption operations directly on M and C respectively. With whatever we have discussed so far, we cannot give a full answer to this question. For the answer, we will need to wait until we reach the later chapters. We only mention here that the encryption algorithms often require as input some mathematical entities (like integers or elements of a field) which are logically not bit strings. But that’s not all! As we see later, the additional transformations often add to the security of the protocols. On the other hand, for a general discussion, it is often unnecessary to start from the encoding process and end at the decoding process. As a result, we will assume, unless otherwise stated, that M is the input to the encryption routine and the output of the decryption routine, in which case fe and fd stand for the encryption and decryption functions only.

Symmetric-key or secret-key cryptography

In the simplest form of locking mechanism, one has Ke = Kd. That is, the same key, called the symmetric key or the secret key, is used for both encryption and decryption. Common examples of such symmetric-key algorithms include DES (Data Encryption Standard) together with its various modifications like the Triple DES and DES-X, IDEA (International Data Encryption Algorithm), SAFER (Secure And Fast Encryption Routine), FEAL (Fast Encryption Algorithm), Blowfish, RC5 and AES (Advanced Encryption Standard). We will not describe all these algorithms in this book. Interested readers can look at the abundant literature to know more about them.

Asymmetric-key or public-key cryptography

The biggest disadvantage of using a secret-key system is that Alice and Bob must agree upon the key Ke = Kd secretly, for example by personal contact or over a secure channel. This is a serious limitation and is not often practical nor even possible. Another drawback of secret-key systems is that every pair of parties needs a key for communication. Thus, if there are n entities communicating over a net, the number of keys would be of the order of n2. Also, each entity has to remember O(n) keys for communicating with other entities. In practice, however, an entity does not communicate with every other entity on the net. Yet the total number of keys to be remembered by an entity could be quite high.

Both these problems can be avoided by using what is called an asymmetric-key or a public-key protocol. In such a protocol, each entity decides a key pair (Ke, Kd), makes the encryption key Ke public and keeps the decryption key Kd secret. Ke is also called the public key and Kd the private key. Anybody who wants to send a message to Bob gets Bob’s public key, encrypts the message with the key, and sends the ciphertext to Bob. Upon receiving the ciphertext, Bob uses his private key to decrypt the message. One may view such a lock as a self-locking padlock. Anybody can lock a box with a self-locking padlock, but opening it requires a key which only Bob possesses.

The source of security of such a system is based on the difficulty of computing the private key Kd given the public key Ke. It is apparent that Ke and Kd are sort of inverses of each other, because the former is used to generate C from M and the latter is used to generate M from C. This is where mathematics comes into the picture. We mention a few possible constructions of key pairs in the next section and the rest of the book deals with an in-depth study of these public-key protocols.

Attractive as it looks, public-key protocols have a serious drawback, namely they are orders of magnitude slower than their secret-key counterparts. This is of concern, if huge amounts of data need to be encrypted and decrypted. This shortcoming can be overcome by using both secret-key and public-key protocols in tandem as follows: Alice generates a secret key (say, for AES), encrypts the message by the secret key and the secret key by the public key of Bob and sends both the encrypted message and the encrypted secret key. Bob first decrypts the encrypted secret key using his private key and uses this decrypted secret key to decrypt the message. Since secret keys are usually short bit strings (most commonly of length 128 bits), the slow performance of the public-key algorithms causes little trouble. But at the same time, Alice and Bob are relieved of having a previous secret meeting or communication for agreeing on the secret key. Moreover, neither Alice nor Bob needs to remember the secret key. During every session of message transmission, a random secret key can be generated and later destroyed, when the communication is over.

1.2.2. Key Exchange

There is an alternative method by which Alice and Bob can exchange secret information (like AES keys) over a public communication channel. Let us first see how this can be done in the physical lock-and-key scenario. Alice generates a secret, puts it in a box, locks the box with her own key and sends it to Bob. Bob, upon receiving the locked box, adds a second lock to it and sends the doubly locked box back to Alice. Alice then removes her lock and again sends the box to Bob. Finally, Bob uses his key to unlock the box and retrieve the secret. A third party (Carol) that can access the box during the three communications finds it locked by Alice or Bob or both. Since Carol does not possess the keys to these locks, she cannot open the box to discover the secret.

This process can be abstractly described as follows: Alice and Bob first independently generate key pairs (AKe, AKd) and (BKe, BKd) respectively. Alice then sends AKe to Bob and Bob sends BKe to Alice. The private keys AKd and BKd are not disclosed. They also agree upon a function g with which Alice computes gA = g(AKd, BKe) and Bob computes gB = g(BKd, AKe). If gA = gB, then this common value can be used as a shared secret between Alice and Bob.

Our intruder Carol knows g and taps the values of AKe and BKe. So the function g should be such that a knowledge of these values alone does not suffice for the computation of gA = gB. One of the private keys AKd or BKd is needed for the computation. Since (AKe, AKd) and (BKe, BKd) are key pairs, it is assumed that private keys are difficult to compute from the knowledge of the corresponding public keys.

Such a technique of exchanging secret values over an insecure channel is called a key-exchange or a key-agreement protocol. It is important to point out here that such a protocol is usually based on the public-key paradigm; that is to say, we do not know secret-key counterparts for a key-exchange protocol. Since a shared secret between the communicating parties is usually short, the low speed of public-key algorithms is really not a concern in this case.

1.2.3. Digital Signatures

A digital signature is yet another application of the public-key paradigm. Suppose Alice wants to sign a message M in such a way that the signature S can be verified by anybody but nobody other than Alice would be able to generate the signature S on the message M. This can be achieved as follows: Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. She now uses the decryption function fd to generate the signature, that is, S = fd(M, Kd). The signature S is then made public. Anybody who has access to Alice’s public key Ke applies the reverse transformation fe to get back the message M = fe(S, Ke).

If Carol signs the message M with a different key , then she generates the signature S′ = fd(M, ). Now, since and Ke are not matching keys, verification using Ke gives M′ = fe(S′, Ke), which is different from M. If we assume that M is a message written in a human-readable language (like English), then M′ would generally look like a meaningless sequence of characters which is neither English nor any sensible string to a human reader. So the signature verifier would then immediately conclude that this is a case of forged signature.

Such a scheme of generating digital signatures is called a signature scheme with message recovery. It is obvious that this is the same as our encrypt–decrypt scheme with the sequence of encryption and decryption steps reversed. If the message M to be signed is quite long, using this algorithm calls for a large execution time both for signature generation and for verification. It is, therefore, customary to use another variant of signature schemes called signature schemes with appendix that we describe now.

Instead of applying the decryption transform directly on M, Alice first computes a short representative H(M) of her message M. Her signature now becomes the pair S = (M, σ), where σ = fd(H(M), Kd). Typically, a hash function (see Section 1.2.6) is used to compute the representative H(M) from M and is assumed to be a public knowledge. Now anybody can verify the signature by checking if the equality H(M) = fe(σ, Ke) holds. If a key different from Kd is used to generate the signature, one would (in general) get a value σ′ ≠ σ and the signature forging will be detected by observing that H(M) ≠ fe(σ′, Ke).

1.2.4. Entity Authentication

By entity authentication, we mean a process in which one entity called the claimant proves its identity to another entity called the prover. Entity-authentication techniques, thus, tend to prevent impersonation of an entity by an intruder. Both secret-key and public-key techniques are used for entity-authentication schemes.

The simplest example of an entity-authentication scheme is the use of passwords, as in a computer where a user (the claimant) tries to gain access to some resources in a computer (the prover) by proving its identity using a password. Password schemes are mostly based on secret-key techniques. For example, the UNIX password system is based on encrypting the zero message (a string of 64 zero bits) using a repeated application of a variant of the DES algorithm with 64 bits of the user input (the password) as the key. Password-based authentication schemes are fixed and time-invariant and are often called weak authentication schemes.

We see applications of public-key techniques in challenge–response authentication schemes (also called strong authentication schemes). Assume that an entity, Alice, wants to prove her identity to another entity, Bob. Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. Now, Bob chooses a random message M, encrypts M using Alice’s public key—that is, computes C = fe(M, Ke)—and sends C to Alice. Alice, upon reception of C, decrypts it using her private key Kd; that is, she regenerates M = fd(C, Kd) and sends M to Bob. Bob compares this value of M with the one he generated, and if a match occurs, Bob becomes sure that the entity who is claiming to be Alice possesses the knowledge of Alice’s private key. If Carol uses any private key other than Kd for the decryption, she gets a message M′ different from M and thereby cannot prove to Bob her identity as Alice. This is how this scheme prevents impersonation of Alice by Carol.

Entity authentication is often carried out using another interesting technique called zero-knowledge proof. In such a protocol, the prover (or any third party listening to the conversation) gains no knowledge regarding the secret possessed by the claimant, but develops the desired confidence regarding the claim by the claimant of the possession of the secret. We provide here an informal example explaining zero-knowledge proofs.

Let us think of a circular cave as shown in Figure 1.1. The cave has two exits, left and right, denoted by L and R respectively. The cave also has a door inside it, which is invisible outside the cave. Alice (A) wants to prove to Bob (B) that she possesses a key to this door without showing him the key or the process of unlocking the door with the key. Bob stations himself somewhere outside the exits of the cave. Alice enters the cave and randomly chooses the left or right wing of the cave (and goes there). She does not disclose this choice to Bob, because Bob is not allowed to know the session secrets too. Once Alice is placed in the cave, Bob makes a random choice from L and R and asks Alice (using cell phones or by shouting loudly) to come out of the cave via that chosen exit. Suppose Bob challenges Alice to use L. If Alice is in the left wing, she can come out of the cave using L. If Alice is in the right wing, she must use her secret key to open the central door to come to the left wing and then go out using exit L. If Alice does not possess the secret key, she can succeed in obeying Bob’s directive with a probability of half. If this procedure is repeated t times, then the probability that Alice succeeds on all occasions without possessing the secret key is (1/2)t = 1/2t. By choosing t appropriately, Bob can make the probability of accepting a false claim arbitrarily small. For example, if t = 20, then the chance is less than one in a million that Alice can establish a false claim.

Figure 1.1. Zero-knowledge proofs


Thus, if Alice succeeds every time, Bob gains the desired confidence that Alice actually possesses the secret. However, during this entire process, Bob can obtain no information regarding Alice’s secrets (the key and the choices of wings). Another important aspect of this interaction is that Alice has no way of predicting Bob’s questions, preventing impostors (of Alice) from fooling Bob.

1.2.5. Secret Sharing

Suppose that a secret piece of information is to be distributed among n entities in such a way that n – 1 (or fewer) entities are unable to construct the secret. All of the n entities must participate to reveal the secret. As usual, let us assume that the secret is an l-bit string. A simple strategy would be to break the string into n parts and provide each entity with a part. This method is, however, not really attractive, because it gives partial information about the secret. Thus, for example, if a 256-bit long bit string is to be distributed equally among 16 entities, any 15 of them working together can reconstruct the secret by trying only 216 = 65536 possibilities for the unknown 16 bits.

We now describe an alternative strategy that does not suffer from this drawback. Once again, we break the secret string into n parts and consider the parts as integers a0, . . . , an–1. We construct the polynomial f(x) = xn+an–1xn–1 + · · · + a1x+a0 and give the integers f(1), f(2), . . . , f(n) to the entities. When all of the entities cooperate, the linear system of equations f(i) = in + an–1in–1 + · · · + a1i + a0, 1 ≤ in, can be solved to find out the unknown coefficients a0, . . . , an–1 which, in turn, reveal the secret. On the other hand, if n – 1 or less entities cooperate, they get an underspecified system of equations in n unknowns, from which the actual solution is not readily available.

The secret-sharing problem can be generalized in the following way: to distribute a secret among n parties in such a way that any m or more of the parties can reconstruct the secret (for some mn), whereas any m – 1 or less parties cannot do the same. A polynomial of degree m as in the above example readily adapts to this generalized situation.

1.2.6. Hashing

A function which converts bit strings of arbitrary lengths to bit strings of a fixed (finite) length is called a hash function. Hash functions play a crucial role in cryptography. We have already seen an application of it for designing a digital signature scheme with appendix. If H is a hash function, a pair of input values (strings) x1 and x2 for which H(x1) = H(x2) is called a collision for H. For any hash function H, collisions must exist, since H is a map from an infinite set to a finite set. However, for cryptographic purposes we want that collisions should be difficult to obtain. More specifically, a cryptographic hash function H should satisfy the following desirable properties:

First pre-image resistance

Except for a small set of hash values y it should be difficult to find an input x with H(x) = y. We exclude a small set of values, because an adversary might prepare (and maintain) a list of pairs (x, H(x)) for certain values of x of her choice. If the given value of y is the second coordinate of one pair in her list, she can produce the corresponding input value x easily.

Second pre-image resistance

Given a pair (x, H(x)), it should be difficult to find an input x′ different from x with H(x) = H(x′).

Collision resistance

It should be difficult to find two different input strings x, x′ with H(x) = H(x′).

Hash functions are also called message digests and can be used with a secret key. Popular examples of unkeyed hash functions are SHA-1, MD5 and MD2, whereas those for keyed hash functions include HMAC and CBCMAC.

1.2.7. Certification

So far we have seen several protocols which are based on the use of public keys of remote entities, but have never questioned the authenticity of public keys. In other words, it is necessary to ascertain that a public key is really owned by a remote entity. Public-key certificates are used to that effect. These are data structures that bind public-key values to entities. This binding is achieved by having a trusted certification authority digitally sign each certificate.

Typically a certificate is issued for a period of validity. However, it is possible that a certificate becomes invalid before its date of expiry for several reasons, like possible or suspected compromise of the private key. Under such circumstances it is necessary that the certification authority revokes the certificate and maintains a list called certificate revocation list (CRL) of revoked certificates. When Alice verifies the authenticity of Bob’s public-key certificate by verifying the digital signature of the authority and does not find the certificate in the CRL, she gains the desired confidence in using Bob’s public key.

The X 5.09 public-key infrastructure specifies Internet standards for certificates and CRLs.

1.3. Public-key Cryptography

In this section, we give a short introduction to the realization of public-key cryptosystems. More specifically, we list some of the computationally intensive mathematical problems and describe how the (apparent) intractability of these problems can be used for designing key pairs. We use some mathematical terms that we will introduce later in this book.

1.3.1. The Mathematical Problems

The security of the public-key cryptosystems is based on the presumed difficulty of solving certain mathematical problems.

The integer factorization problem (IFP)

Given the product n = pq of two distinct prime integers p and q, find p and q.

The discrete logarithm problem (DLP)

Let G be a finite cyclic (multiplicatively written) group with cardinality n and a generator g. Given an element , find an integer x (or the integer x with 0 ≤ xn – 1) such that a = gx in G. Three different types of groups are commonly used for cryptographic applications: the multiplicative group of a finite field, the group of rational points on an elliptic curve over a finite field and the Jacobian of a hyperelliptic curve over a finite field. By an abuse of notation, we often denote the DLP over finite fields as simply DLP, whereas the DLP in elliptic curves and hyper-elliptic curves is referred to as the elliptic curve discrete logarithm problem (ECDLP) and the hyperelliptic curve discrete logarithm problem (HECDLP).

The Diffie–Hellman problem (DHP)

Let G and g be as above. Given elements ga and gb of G, compute the element gab. As in the case of the DLP, the DHP can be applied to the multiplicative group of finite fields, the group of rational points on an elliptic curve and the Jacobian of a hyperelliptic curve.

We show in the next section how (the intractability of) these problems can be exploited to create key pairs for various cryptosystems. These computational problems are termed difficult, intractable, infeasible or intensive in the sense that there are no known algorithms to solve these problems in time polynomially bounded by the input size. The best-known algorithms are subexponential or even fully exponential in some cases. This means that if the input size is chosen to be sufficiently large, then it is infeasible to compute the private key from a knowledge of the public key in a reasonable amount of time. This, in turn, implies (not provably, but as the current state of the art stands) that encryption or signature verification can be done rather quickly (in polynomial time), but the converse process of decryption or signature generation cannot be done in feasible time, unless one knows the private key. As a result, encryption (or signature verification) is called a trapdoor one-way function, that is, a function which is easy to compute but for which the inverse is computationally infeasible, unless some additional information (the trapdoor) is available.

It is, however, not known that these problems are really computationally infeasible, that is, there is no proof of the fact that these problems cannot be solved in polynomial time. As a result, the public-key cryptographic systems based on these problems are not provably secure.

1.3.2. Realization of Key Pairs

In RSA and similar cryptosystems, one generates two (distinct) suitably large primes p and q and computes the product n = pq. Then φ(n) = (p – 1)(q – 1), where φ denotes Euler’s totient function. One then chooses a random integer e with gcd(e, φ(n)) = 1. There exists an integer d such that ed ≡ 1 (mod φ(n)). The integer e is used as the public key, whereas the integer d is used as the private key.

If the IFP can be solved fast, one can also compute φ(n) easily, and subsequently d can be computed from e using the (polynomial-time) extended GCD algorithm. This is why[2] we say that the RSA cryptosystem derives its security from the intractability of the IFP.

[2] The problem of factoring n = pq is polynomial-time equivalent to computing φ(n) = (p – 1)(q – 1).

In order to see how RSA encryption and decryption work, let the plaintext message be encoded as an integer m with 2 ≤ m < n. The ciphertext message is generated (as an integer) as c = me (mod n). Decryption is analogous, that is, m = cd (mod n). The correctness of the algorithm follows from the fact that ed ≡ 1 (mod φ(n)). It is, however, not proved that one has to know d or φ(n) or the factorization of n in order to decrypt an RSA-encrypted message. But at present no better methods are known.

Let us now consider the discrete logarithm problem. Let G be a finite cyclic multiplicative group (as those mentioned above) where it is easy to multiply two elements, but where it is difficult to compute discrete logarithms. Let g be a generator of G. In order to set up a random key pair over such a group, one chooses the private key as a random integer d, 2 ≤ d < n, where n is the cardinality of G. The public key e is then computed as an element of G as e = gd.

Applications of encryption–decryption schemes based on the key pair (gd, d) are given in Chapter 5. Now, we only remark that many such schemes (like the ElGamal scheme) derive their security from the DHP instead of the DLP, whereas the other schemes (like the Nyberg–Rueppel scheme) do so from the DLP. It is assumed that these two problems are computationally equivalent (at least for the groups of our interest). Obviously, if one assumes availability of a solution of the DLP, one has a solution for the DHP too (gab = (ga)b). The reverse implication is not clear.

1.3.3. Public-key Cryptanalysis

As we pointed out earlier, (most of) the public-key cryptosystems are not provably secure in the sense that they are based on the apparent difficulty of solving certain computational problems. It is expedient to know how difficult these problems are. No non-trivial complexity–theoretic statements are available for these problems, and as such it is worthwhile to study the algorithms known till date to solve these problems. Unfortunately, however, many of the algorithms of this kind are often much more complicated than the algorithms for building the corresponding cryptographic systems. One needs to acquire more mathematical machinery in order to understand (and augment) these cryptanalytic algorithms. We devote Chapter 4 to a detailed discussion on these algorithms.

In specific situations, one need not always use these computationally intensive algorithms. Access to a party’s decryption equipment may allow an adversary to gain partial or complete information about the private key by watching a decryption process. For example, an adversary (say, the superuser) might have the capability to read the contents of the memory holding a private key during some decryption process. For another possibility, think of RSA decryption which involves a modular exponentiation. If the standard square-and-multiply algorithm (Algorithm 3.9) is used for this purpose and the adversary can tap some hardware details (like machine cycles or power fluctuations) during a decryption process, she can guess a significant number of the bits in the private key. Such attacks, often called side-channel attacks, are particularly relevant for cryptographic applications based on smart cards.

A cryptographic system is (believed to be) strong if and only if there are no good known mechanisms to break it. It is, therefore, for the sake of security that we must study cryptanalysis. Cryptography and cryptanalysis are deeply intertwined and a complete study of one must involve the other.

1.4. Some Cryptographic Terms

In cryptology, there are different models of attacks or attackers.

1.4.1. Models of Attacks

So far we have assumed that an adversary can only read messages during transmission over a channel. Such an adversary is called a passive adversary. An active adversary, on the other hand, can mutilate or delete messages during transmission and/or generate false messages. An attack mounted by an active (resp.[3] a passive) adversary is called an active (resp. a passive) attack. In this book, we will mostly concentrate on passive attacks.

[3] Throughout the book, resp. stands for respectively.

1.4.2. Models of Passive Attacks

A two-party communication involves transmission of ciphertext messages over a communication channel. A passive attacker can read these ciphertext messages. In practice, however, an attacker might have more control over the choice of ciphertext and/or plaintext messages. Based on these capabilities of the attacker we have the following types of attacks.

Ciphertext-only attack

This is the weakest model of the adversary. Here the attacker has absolutely no choices on the ciphertext messages that flow in the channel and also on the corresponding plaintext messages. Using only these ciphertext messages the attacker has to obtain a private key and/or a plaintext message corresponding to a new ciphertext message.

Known-pair attack

In this kind of attack (also called known-plaintext or known-ciphertext attack), the attacker uses her knowledge of some plaintext–ciphertext pairs. If many such pairs are available to the attacker, she can use these pairs to deduce a pattern based on which she can subsequently gain some information on a new plaintext for which the ciphertext is available. In a public-key scheme, the adversary can generate as many such pairs as she wants, because in order to generate such a pair it is sufficient to have a knowledge of the receiver’s public key. Thus a public-key encryption scheme must provide sufficient security against known plaintext attacks.

Chosen-plaintext attack

In this kind of attack, the attacker knows some plaintext–ciphertext pairs in which the plaintexts are chosen by the attacker. As discussed earlier, such an attack is easily mountable for a public-key encryption scheme.

Adaptive chosen-plaintext attack

This is similar to the chosen-plaintext attack with the additional possibility that the attacker chooses the plaintexts in the known plaintext–ciphertext pairs sequentially and adaptively based on the knowledge of the previous pairs. This kind of attack can be easily mounted on public-key encryption systems.

Chosen-ciphertext attack

The attacker has knowledge of some plaintext–ciphertext pairs in which the ciphertexts are chosen by the attacker. Such an attack is not directly mountable on a public-key scheme, since obtaining a plaintext from a chosen ciphertext requires knowledge of the private key. However, if the attacker has access to the receiver’s decryption equipment, the machine can divulge the plaintexts corresponding to the ciphertexts that the attacker supplies to the machine. In this context, we assume that the machine does not reveal the private key itself, that is, it has the key stored secretly somewhere in its hardware which the attacker cannot directly access. However, the attacker can run the machine to know the plaintexts corresponding to the ciphertexts of her choice. Later (when the attacker no longer has access to the decryption equipment) the known pairs may be exploited to obtain information about the plaintext corresponding to a new ciphertext.

Adaptive chosen-ciphertext attack

This is similar to the chosen-ciphertext attack with the additional possibility that the attacker chooses the ciphertexts in the known pairs sequentially and adaptively based on her knowledge of the previously generated plaintext–ciphertext pairs. This attack is mountable in a scenario described in connection with chosen-ciphertext attacks.

For a digital signature scheme, there are equivalent names for these types of attacks. The attacker is assumed to have access to the public key of the signer, because this key is used for signature verification. An attempt to forge signatures based only on the knowledge of this verification key is called a key-only attack. The adversary may additionally possess knowledge of some message–signature pairs. An attack based on this knowledge is called a known-pair or known-message or known-signature attack. If the messages are chosen by the adversary, we call the attack a chosen-message attack. If the adversary generates the sequence of messages in a chosen-message attack adaptively (based on the previously generated message–signature pairs), we have an adaptive chosen-message attack. An (adaptive or non-adaptive) chosen-message attack can be mounted, if the attacker gains access to the signer’s signature generation equipment, or if the signer is willing to sign arbitrary messages provided by the adversary.

The attacker can choose some signatures and generate the corresponding messages by encrypting them with the signer’s public key. The private-key operation on these messages generates the signatures chosen by the attacker. This gives chosen-signature and adaptive chosen-signature attacks on a digital signature scheme. Now the adversary cannot directly control the messages to sign. On the other hand, such an attack is easily mountable, because it utilizes only some public knowledge (the signer’s public key). Indeed, one may treat chosen-signature attacks as variants of key-only attacks.

1.4.3. Public Versus Private Algorithms

So far, we have assumed that all the parties connected to a network know the algorithms used in a cryptographic scheme. The security of the scheme is based on the difficulty of obtaining some secret information (the secret or private key).

It, however, remains possible that two parties communicate using an algorithm unknown to other entities. Top-secret communications (for example, during wars or diplomatic transactions) often use private cryptographic algorithms. In this book, we will not deal with such techniques. Our attention is focused mostly on Internet applications in which public knowledge of the algorithms is of paramount importance (for the sake of universal applicability and convenience).

In short, this book is going to deal with a world in which only public public-key algorithms are deployed and in which adversaries are usually passive. A restricted model of the world though it may be, it is general and useful enough to concentrate on. Let us begin our journey!

Chapter Summary

This chapter provides an overview of the problems that cryptology deals with. The first and oldest cryptographic primitive is encryption for secure transmission of messages. Some other primitives are key exchange, digital signature, authentication, secret sharing, hashing, and digital certificates. We then highlight the difference between symmetric (secret-key) and asymmetric (public-key) cryptography. The relevance of some computationally intractable mathematical problems in public-key cryptography is discussed next, and the working of a prototype public-key cryptosystem (RSA) is explained. We finally discuss different models of attacks on cryptosystems.

Not uncommonly, some people think that cryptology also deals with intrusion, viruses, and Trojan horses. We emphasize that this is never the case. Data and network security is the branch that deals with these topics. Cryptography is also a part of this branch, but not conversely. Imagine that your house is to be secured against theft. First, you need a good lock—that is, cryptography. However, a lock has nothing to prevent a thief from entering the house after breaking the window panes. A bad butler who leaks secret information of the house to the outside world also does not come under the jurisdiction of the lock. Securing your house requires adopting sufficient guards against all these possibilities of theft. In this book, we will study only the technology of manufacturing and breaking locks.

2. Mathematical Concepts

2.1Introduction
2.2Sets, Relations and Functions
2.3Groups
2.4Rings
2.5Integers
2.6Polynomials
2.7Vector Spaces and Modules
2.8Fields
2.9Finite Fields
2.10Affine and Projective Curves
2.11Elliptic Curves
2.12Hyperelliptic Curves
2.13Number Fields
2.14p-adic Numbers
2.15Statistical Methods
 Chapter Summary
 Sugestions for Further Reading

Young man, in mathematics you don’t understand things, you just get used to them.

—John von Neumann

Mathematics contains much that will neither hurt one if one does not know it nor help one if one does know it.

—J. B. Mencken

Mathematics is the Queen of Science but she isn’t very pure; she keeps having babies by handsome young upstarts and various frog princes.

—Donald Kingsbury

2.1. Introduction

In this chapter, we introduce the basic mathematical concepts that one should know in order to understand the public-key cryptographic protocols and the corresponding cryptanalytic algorithms described in the later chapters. If the reader is already familiar with these concepts, she may quickly browse through the chapter in order to know about our notations and conventions.

This chapter is meant for cryptology students and as such does not describe the mathematical topics in their full generality. It is our intention only to state (and, if possible, prove) the relevant results that would be useful for the rest of the book. For further study, we urge the reader to consult the books suggested at the end of this chapter.

2.2. Sets, Relations and Functions

Sets are absolutely basic entities used throughout the present-day study of mathematics. Unfortunately, however, we cannot define sets. Loosely speaking, a set is an (unordered) collection of objects. But we run into difficulty with this definition for collections that are too big. Of course, infinite sets like the set of all integers or real numbers are not too big. However, a collection of all sets is too big to be called a set. (Also see Exercise 2.6.) It is, therefore, customary to have an axiomatic definition of sets. That is to say, a collection qualifies to be a set if it satisfies certain axioms. We do not go into the details of this axiomatic definition, but tell the axioms as properties of sets. Luckily enough, we won’t have a chance in the rest of this book to deal with collections that are not sets. So the reader can, for the time being, have faith in the above (wrong) identification of a set as a collection.

An object in a set is commonly called an element of A. By the notation , we mean that a is an element of the set A. Often a set A can be represented explicitly by writing down its elements within curly brackets or braces. For example, A = {2, 3, 5, 7} denotes the set consisting of the elements 2, 3, 5, 7 which are incidentally all the (positive) prime numbers less than 10. We often use the ellipsis sign (. . .) to denote an infinite (or even a finite) set. For example, would denote the set of all (positive) prime numbers. (We prove later that is an infinite set.) Alternatively, we often describe a set by mentioning the properties of the elements of the set. For example, the set can also be described as .

Some frequently occurring sets are denoted by special symbols. We list a few of them here.

The set of all natural numbers, that is, {1, 2, 3, . . .}
The set of all non-negative integers, that is, {0, 1, 2, . . .}
The set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .}
The set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .}
The set of all rational numbers, that is,
The set of all non-zero rational numbers
The set of all real numbers
The set of all non-zero real numbers
The set of all complex numbers
The set of all non-zero complex numbers
The empty set

The cardinality of a set A is the number of elements in A. We use the symbol #A to denote the cardinality of A. If #A is finite, we call A a finite set. Otherwise A is said to be infinite. The empty set has cardinality zero.

2.2.1. Set Operations

Let A and B be two sets. We say that A is a subset of B and denote this as AB, if all elements of A are in B. Two sets A and B are equal (that is, A = B) if and only if AB and BA. A is said to be a proper subset of B (denoted ), if AB and AB (that is, BA).

The union of A and B is the set whose elements are either in A or in B (or both). This set is denoted by AB. The intersection of A and B is the set consisting of elements that are common to A and B. The intersection of A and B is denoted by AB. If , then we say that A and B are disjoint. In that case, the union AB is also called a disjoint union and is referred to as by AB. (For a generalization, see Exercise 2.7.) The difference of A and B, denoted A \ B, is the set whose elements are in A but not in B. If A is understood from the context and BA, then we denote A \ B by and refer to as the complement of B (in A). The product A × B of two sets A and B is the set of all ordered pairs (a, b) where and .

The notion of union, intersection and product of sets can be readily extended to an arbitrary family of sets. Let Ai, , be a family of sets indexed by I. In this case, we denote the union and intersection of Ai, , by and respectively. The product of Ai, , is denoted by . When Ai = A for all , we denote the product also as AI. If, in addition, I is a finite set of cardinality n, then the product AI is also written as An.

2.2.2. Relations

A relation ρ on a set A is a subset of A × A. For , we usually say a ρ b implying that a is related by ρ to b. Common examples are the standard relations =, ≠, ≤, <, ≥, > on (or or ).

A relation ρ on a set A is called reflexive, if a ρ a for all . For example, =, ≤ and ≥ are reflexive relations on , but the relations ≠, <, > are not.

A relation ρ on A is called symmetric, if a ρ b implies b ρ a. On the other hand, ρ is called anti-symmetric if a ρ b and b ρ a imply a = b. For example, = is symmetric and anti-symmetric, <, ≤, > and ≥ are anti-symmetric but not symmetric, ≠ is symmetric but not anti-symmetric.

A relation ρ on A is called transitive if a ρ b and b ρ c imply a ρ c, For example, =, <, ≤, >, ≥ are all transitive, but ≠ is not transitive.

An equivalence relation is one which is reflexive, symmetric and transitive. For example, = is an equivalence relation on , but neither of the other relations mentioned above (≠, <, ≥ and so on) is an equivalence relation on .

A partition of a set A is a collection of pairwise disjoint subsets Ai, , of A, such that , that is, A is the union of Ai, , and for i, , ij, . The following theorem establishes an important connection between equivalence relations and partitions.

Theorem 2.1.

An equivalence relation on a set A produces a partition of A. Conversely, every partition of a set A corresponds to an equivalence relation on A.

Proof

Let ρ be an equivalence relation on a set A. For , let us denote . Clearly, , since (by reflexivity). Now we show that for a, , either [a] = [b] or . Assume that . Choose . By construction, a ρ c. Now choose . Then a ρ d and b ρ d. By symmetry, d ρ b, so that by transitivity a ρ b, that is, b ρ a. But a ρ c. Hence, once again by transitivity, b ρ c, that is, . Thus [a] ⊆ [b]. Similarly [b] ⊆ [a].

Conversely, let Ai, , be a partition of A. Define a relation ρ on A such that a ρ b if and only if a and b are in the same subset Ai for some i. It is easy to see that ρ is an equivalence relation on A.

The subset [a] of A defined in the proof of the above theorem is called the equivalence class of a with respect to the equivalence relation ρ.

An anti-symmetric and transitive relation is called a partial order (or simply an order). All of the relations =, ≤, <, ≥, > are partial orders on (but ≠ is not). A partial order ρ on A is called a total order or a linear order or a simple order, if for every a, , ab, either a ρ b or b ρ a. For example, if we take A = {1, 2, 3} and the relation ρ = {(1, 2), (1, 3)}, then ρ is a partial order but not a total order (because it does not specify a relation between 2 and 3). On the other hand, ρ′ = {(1, 2), (1, 3), (2, 3)} is a total order. A set with a partial (resp. total) order is often called a partially ordered (resp. totally ordered or linearly ordered or simply ordered) set.

2.2.3. Functions

Let A and B two sets (not necessarily distinct). A function or a map f from A to B, denoted f : AB, assigns to each some element . In this case, we write b = f(a) or f maps ab and say that b is the image of a (under f). For example, if , then the assignment aa2 is a function. On the other hand, the assignment (the non-negative square root) is not a function, because it is not defined for negative values of a. However, if and , then the assignment (with non-negative real and imaginary parts) is a function.

The function f : AA assigning aa for all is called the identity map on A and is usually denoted by idA. On the other hand, if f : AB maps all the elements of A to a fixed element of B, then f is said to be a constant function. A function which is not constant is called a non-constant function.

A function f : AB that maps different elements of A to different elements of B is called injective or one-one. In other words, we call f to be injective if and only if f(a) = f(a′) implies a = a′. The function given by aa2 is not injective, since f(–a) = f(a) for all . On the other hand, the function given by a ↦ 2a is injective. An injective map f : AB is sometimes denoted by the special symbol f : AB.

The image of a function f : AB is defined to be the following subset of . It is denoted by f(A) or by Im f. The function f is said to be surjective or onto or a surjection, if Im f = B, that is, every element b of B has at least one preimage (which means f(a) = b). As an example, the function given by aa/2 (if a is even) and by a ↦ (a – 1)/2 (if a is odd) is surjective, whereas the function that maps a → |a| (the absolute value) is not surjective. A surjective map f : AB is sometimes denoted by the special symbol f : AB.

A map f : AB is called bijective or a bijection, if it is both injective and surjective. For example, the identity map on a set is bijective. Another example of a bijective function is that maps a to the ath prime.

Let f : AB and g : BC be functions. The composition of f and g is the function from A to C that takes ag(f(a)). It is denoted by g ο f, that is, (g ο f)(a) = g(f(a)). Note that in the notation g ο f one applies f first and then g. The notion of composition of functions can be extended to more than two functions. In particular, if f : AB, g : BC and h : CD are functions, then (h ο g) ο f and h ο (g ο f) are the same function from A to D, so that we can unambiguously write this as h ο g ο f.

2.2.4. The Axioms of Mathematics

The study of mathematics is based on certain axioms. We state four of these axioms. It is not possible to prove the axioms independently, but it can be shown that they are equivalent in the sense that each of them can be proved, if any of the others is assumed to be true.

Let A be a partially ordered set under the relation . An element is called maximal (resp. minimal), if there is no element , ba, that satisfies (resp. ). Let B be a non-empty subset of A. Then an upper bound (resp. a lower bound) for B is an element such that (resp. ) for all . If an upper bound (resp. a lower bound) a of B is an element of B, then a is called a last element or a largest element or a maximum element (resp. a first element or a least element or a smallest element or a minimum element) of B. By antisymmetry, it follows that a first (resp. last) element of B, if existent, is unique. A chain of A is a totally ordered (under ) subset of A.

Consider the sets , and with the natural order ≤. Neither of these sets contains a maximal element. contains a minimal element 1, but and do not contain minimal elements. The subset of even natural numbers has two lower bounds, namely 1 and 2, of which 2 is the first element of .

A totally ordered set A is said to be well ordered (and the relation is called a well order), if every non-empty subset B of A contains a first element.

Axiom 2.1. Zermalo’s well-ordering principle

Every set A can be well ordered, that is, there is a relation which well orders A.

The set is well-ordered under the natural relation ≤. The set can be well ordered by the relation defined as . A well ordering of is not known.

Axiom 2.2. Zorn’s lemma

Let A be a partially ordered set. If every chain of A has an upper bound (in A), then A has at least one maximal element.

To illustrate Zorn’s lemma, consider any non-empty set A and define to be the set of all subsets of A. is called the power set of A and is partially ordered under containment ⊆. A chain of is a set of subsets of A such that for all i, either AiAj or AjAi. Clearly, the union is an upper bound of the chain. Then Zorn’s lemma guarantees that has at least one maximal element. In this case, the maximal element, namely A, is unique. If A is finite, then for the set of all proper subsets of A, a maximal element (under the partial order ⊆) exists by Zorn’s lemma, but is not unique, if #A > 1.

Axiom 2.3. Hausdorff’s maximal principle

Let be a partial order on a set A. Then there is a maximal chain B of A, that is, if C is any chain with BCA, then C = B.

Finally, let A be a set and , that is, is the set of all non-empty subsets of A. A choice function of A is a function such that for every we have .

Axiom 2.4. Axiom of choice

Every set has a choice function.

Exercise Set 2.2

2.1
  1. Let G = (V, E) be an undirected graph. Define a relation ρ on the vertex set V of G by: u ρ v if and only if there is a path from u to v. Show that ρ is an equivalence relation on V. What are the equivalence classes for this relation?

  2. Let G = (V, E) be a directed acyclic graph. Define the relation ρ on V as in (a). Show that ρ is a partial order on V. When is ρ a total order?

2.2Let f : AB and g : BA be functions. Show that if f ο g = idB, then g is injective and f is surjective. In particular, f (and also g) is bijective, if f ο g = idB and g ο f = idA. In this case, we call g to be the inverse of f and denote this as g = f–1. Show by examples that both the conditions f ο g = idB and g ο f = idA are necessary for f to be bijective.
2.3Let f : AB a map from a finite set A to a finite set B. Prove that
  1. #A ≤ #B, if f is injective,

  2. #A ≥ #B, if f is surjective, and

  3. #A = #B, if f is bijective.

2.4Let A be a finite set and let f : AA be a map. Show that the following conditions are equivalent.
  1. f is injective.

  2. f is surjective.

  3. f is bijective.

Show by examples that this equivalence need not hold, if A is an infinite set.

2.5Let A and B be two arbitrary sets, f : AB a map, A′ ⊆ A and B′ ⊆ B. We define and . Show that:
  1. If A′ ⊆ A″ ⊆ A, then f(A′) ⊆ f(A″).

  2. If B′ ⊆ B″ ⊆ B, then f–1(B′) ⊆ f–1(B″).

  3. f–1(f(A′)) ⊇ A′.

  4. f(f–1(B′)) ⊆ B′.

  5. f(f–1(f(A′))) = f(A′).

  6. f–1(f(f–1(B′))) = f–1(B′).

2.6

Russell’s paradox A collection C is called ordinary, if C is not a member of C. A collection which is not ordinary is called extraordinary. Show that the collection of all ordinary collections is neither ordinary nor extraordinary.

2.7Let Ai, , be a family of sets (not necessarily pairwise disjoint). For each , consider the set . Show that the family Bi, , are pairwise disjoint. The union is called the disjoint union of Ai, .

2.3. Groups

So far we have studied sets as unordered collections. However things start getting interesting if we define one or more binary operations on sets. Such operations define structures on sets and we compare different sets in light of their respective structures. Groups are the first (and simplest) examples of sets with binary operations.

Definition 2.1.

A binary operation on a set A is a map from A × A to A. If ◊ is a binary operation on A, it is customary to write aa′ to denote the image of (a, a′) (under ◊).

For example, addition, subtraction and multiplication are all binary operations on (or or ). Subtraction is not a binary operation on , since, for example, 2 – 3 is not an element of . Division is not a binary operation on , since division by zero is not defined. Division is a binary operation on .

2.3.1. Definition and Basic Properties

Definition 2.2.

A group[1] (G, ◊) is a set G together with a binary operation ◊ on G, that satisfy the following three conditions:

[1] In binary operations and algebras generally there is a morass of terminology which reflects on the literacy of the promulgators. Starting for example with a poor choice, namely “group”, we now have “semigroup” (why?), “loop” (why?), “groupoid”, and “partial groupoid”. . . .Among other poor choices are “ring”, “field”, “ideal”, “category theory”, and “universal algebra”. “Ideal” was used by Dedekind in a sense which made sense to mathematicians of that day but it does not today. “Field” can best be labeled as ridiculous. As to categories of category theory, the concept of category is too broad for that reduction. It is not good taste to take such a term and place it in restricted surroundings.

—Preston C. Hammer

  1. Associativity (ab) ◊ c = a ◊ (bc) for all a, b, .

  2. Identity element There exists a (unique) element such that ea = ae = a for all . The element e is called the identity of G.

  3. Inverse For each , there exists a (unique) element such that ab = ba = e. The element b is called the inverse of a.

    If, in addition, we assume that

  4. Commutativity ab = ba for all a, ,

    then G is called a commutative or an Abelian group.

A group (G, ◊) is also written in short as G, when the operation ◊ is understood from the context. More often than not, the operation ◊ is either addition (+) or multiplication (·) in which cases we also say that G is respectively an additive or a multiplicative group. For a multiplicative group, we often omit the multiplication sign and denote a · b simply as ab. The identity in an additive group is usually denoted by 0, whereas that in a multiplicative group by 1. The inverse of an element a in these cases are denoted respectively by –a and a–1. Groups written additively are usually Abelian, but groups written multiplicatively need not be so.

Note that associativity allows us to write abc unambiguously to represent (ab) ◊ c = a ◊ (bc). More generally, if , then a1 ◊ ··· ◊ an represents a unique element of the group irrespective of how we insert brackets to compute the element a1 ◊ ··· ◊ an.

Example 2.1.
  1. The set is an Abelian group under addition. The identity is 0 and the inverse of a is –a. Note, however, that is not a group under multiplication, because though it contains the multiplicative identity 1, multiplicative inverse is not defined for all elements in except ±1.

  2. The set of non-zero rational numbers is a group under multiplication. The identity is 1 = 1/1 and the inverse of a/b is b/a.

  3. For a set A, the set of all bijective functions AA is a group under composition of functions. The identity element is idA and the inverse of f is denoted by f–1. (See also Exercise 2.2.) This group is not Abelian in general.

  4. The set of all m × n matrices with entries from is a group under matrix addition. On the other hand, the set of all n × n invertible matrices over is a group under matrix multiplication and is called the general linear group. Note that is another example of a group that is not Abelian (for n > 1).

  5. A group G is called finite, if G as a set consists of (only) finitely many elements. Finite groups play an extremely important role in cryptography. Here is our first example of finite groups: Let n be an integer ≥ 2. The set

    is a group under addition modulo n (that is, add (and subtract) two elements in as integers and if the result is not in , take the remainder of division by n). For this group, the identity element is 0 and –a = na for a ≠ 0 and –0 = 0. (See Example 2.3 for a formal definition of .)

  6. For an integer n ≥ 2, define the set

    If n is prime, then . The set is a group under multiplication modulo n with identity 1. We need little more machinery than introduced so far in order to prove that every element has a multiplicative inverse modulo n. Other group axioms are easy to check.

Proposition 2.1.

Let (G, ◊) be a group and let a, b, . Then ab = ac implies b = c. Similarly, ac = bc implies a = b. These statements are commonly known as (left and right) cancellation laws.

Proof

We prove only the left cancellation law. The proof of the other law is similar. Let e denote the identity of G and d the inverse of a. Then b = eb = (da) ◊ b = d ◊ (ab) = d ◊ (ac) = (da) ◊ c = ec = c.

2.3.2. Subgroups, Cosets and Quotient Groups

Definition 2.3.

Let (G, ◊) be a group. Then a subset H of G is called a subgroup of G, if H is a group under the operation ◊ inherited from G. For a subset H of G to be a subgroup, it is necessary and sufficient that H is closed under the operation ◊ and under inverse. Any subgroup of an Abelian group is also Abelian.

Example 2.2.
  1. For any group G with identity element e, the subsets {e} and G are subgroups of G. They are called the trivial subgroups of G.

  2. For an integer n ≥ 2, the set of all integral multiples of n is an additive subgroup of and is denoted by .

  3. The set consisting of all n × n real matrices of determinant 1 is a subgroup of and is commonly referred to as the special linear group.

  4. Note that though in Example 2.1 is a subset of , it is not a subgroup of , since it is not closed under the addition of . It is a group under addition modulo n which is not the same as integer addition.

Let (G, ◊) be a group. For subsets A and B of G, we denote by AB the set . In particular, if A = {a} (resp. B = {b}), then AB is denoted by aB (resp. Ab). Note that the sets AB and BA are not necessarily equal. If G is Abelian, then AB = BA.

Definition 2.4.

Let (G, ◊) be a group, H a subgroup of G and . The set aH is called the left coset of a with respect to H and the set Ha is called the right coset of a with respect to H. If G is Abelian, then a left coset is naturally a right coset and vice versa. In that case, we call aH (or Ha) simply a coset.

From now onward, we consider left cosets only and call them cosets. If the underlying group is Abelian, then they are the same thing. The theory of right cosets can be parallelly developed, but we choose to omit that here. For simplicity, we also assume that the group G is a multiplicative group, so that the operation ◊ would be replaced by · (or by mere juxtaposition).

Proposition 2.2.

Let G be a (multiplicative) group and H a subgroup of G. Then, the cosets aH, , partition G. Two cosets aH and bH are equal if and only if . There is a bijective map from aH to bH for every a, .

Proof

We define a relation ~ on G such that a ~ b if and only if . Clearly, a ~ a. Now a ~ b implies , so that (See Exercise 2.8), that is, b ~ a. Finally, a ~ b and b ~ c imply a ~ c, since a–1c = (a–1b)(b–1c). Thus ~ is an equivalence relation on G and hence by Theorem 2.1 produces a partition of G. We now show that the equivalence class [a] of is the coset aH. This follows from that for some for some .

Now we define a map by ahbh for every . The map is clearly surjective. Injectivity of follows from the left cancellation law (Proposition 2.1). Hence is bijective.

The following theorem is an important corollary to the last proposition.

Theorem 2.2. Lagrange’s theorem

Let G be a finite group and H a subgroup of G. Then, the cardinality of G is an integral multiple of the cardinality of H.

Proof

From Proposition 2.2, the cosets form a partition of G and there is a bijective map from one coset to another. Hence by Exercise 2.3 all cosets have the same cardinality. Finally, note that H is the coset of the identity element.

Definition 2.5.

Let G be a group and H a subgroup of G. The number of distinct cosets of H in G is called the index of H in G and is denoted by [G : H]. If G is finite, then [G : H] = #G/#H.

Definition 2.6.

Let H be a subgroup of a (multiplicative) group G. Then H is called a normal subgroup of G, if (aH)(bH) = (abH) for all a, . It is clear that any subgroup H of an Abelian group G satisfies this condition and hence is normal.

If H is a normal subgroup of a group G, then the cosets aH, , form a group with multiplication defined by (aH)(bH) = (abH). This group is called the quotient group of G with respect to H and is denoted by G/H.

Example 2.3.
  1. Let n be an integer ≥ 2. The subgroup of (, +) (Example 2.2) is normal, since is Abelian. The coset of is the set . The quotient group is denoted as and is essentially the same as the group {0, 1, . . . , n – 1} with the operation of addition modulo n (Example 2.1).

  2. For any group G with identity e, the trivial subgroups G and {e} are normal. G/G is a group with a single element, whereas G/{e} is essentially the same as the group G.

2.3.3. Homomorphisms

Definition 2.7.

Let (G, ◊) and (G′, ⊙) be groups. A function f : GG′ is called a homomorphism (of groups), if f(ab) = f(a) ⊙ f(b) for all a, , that is, if f commutes with the group operations of G and G′.

A group homomorphism f : GG′ is called an isomorphism, if there exists a group homomorphism g : G′ → G such that g ο f = idG and f ο g = idG. It can be easily seen that a homomorphism f : GG′ is an isomorphism if and only if f is bijective as a function.[2] If there exists an isomorphism f : GG′, we say that the groups G and G′ are isomorphic and write GG′.

[2] If f : GG′ is a bijective homomorphism, its inverse f–1 : G′ → G is bijective as a function. However, it is not obvious that f–1 has to be a group homomorphism. We are lucky here; f–1 is.

A homomorphism f from G to itself is called an endomorphism (of G). An endomorphism which is also an isomorphism is called an automorphism. The set of all automorphisms of a group G is a group under function composition. We denote this group by Aut G.

Example 2.4.
  1. The canonical inclusion aa/1 is a group homomorphism from (, +) to (, +). More generally, if H is a subgroup of G, then the map hh for all is a group homomorphism. In particular, the identity map on any group G is an automorphism of G (and is the identity element of the group Aut G).

  2. For a (multiplicative) group G and a normal subgroup H, the map GG/H that takes to its coset aH is a surjective group homomorphism. It is called the canonical surjection of G onto G/H. For example, the map that takes a to its remainder of division by n (≥ 2) is a canonical surjection from the additive group to the quotient group . (Also see Examples 2.1, 2.2 and 2.3.)

  3. The map that takes a complex number z = a + ib to its conjugate is a group automorphism of both (, +) and (, ·).

Proposition 2.3.

Let f be a group homomorphism from (G, ◊) to (G′, ⊙). Let e and e′ denote the identity elements of G and G′ respectively. Then f(e) = e′. If a, and c, satisfy ab = e, cd = e′ and f(a) = c, then f(b) = d.

Proof

We have e′ ⊙ f(e) = f(e) = f(ee) = f(e) ⊙ f(e), so that by right cancellation f(e) = e′. To prove the second assertion we note that cd = e′ = f(e) = f(ab) = f(a) ⊙ f(b) = cf(b). Thus f(b) = d.

Definition 2.8.

With the notations of the last proposition we define the kernel of f to be the following subset of G:

Ker .

We also define the image of f to be the subset

Im

of G′. Then we have the following important theorem.

Theorem 2.3. Isomorphism theorem

Ker f is a normal subgroup of G, Im f is a subgroup of G′, and G/ Ker f ≅ Im f.

Proof

In order to simplify notations, let us assume that G and G′ are multiplicatively written groups. For u, , we have f(uv–1) = f(u)(f(v))–1 = e′, that is, . By Exercise 2.8, Ker f is a subgroup of H. We now show that it is normal. Note that for and we have f(aua–1) = f(a)f(u)f(a–1) = e′, that is, , since f(u) = e′ and f(a–1) = f(a)–1. By Exercise 2.10, Ker f is a normal subgroup of G. Now let a′ = f(a) and b′ = f(b) be arbitrary elements of Im f. Then, f(ab–1) = a′(b′)–1, that is, . Thus, by Exercise 2.8 Im f is a subgroup of G′.

Now define a map that takes a Ker ff(a). Let a Ker f = b Ker f. Then by Proposition 2.2, , that is, b = au for some . But then f(b) = f(au) = f(a)f(u) = f(a)e′ = f(a). This shows that the map is well-defined. It is easy to check that is a group homomorphism. Now implies f(a) = f(b), that is, f(a–1b) = e′, that is, , that is, a Ker f = b Ker f. Thus is injective. It is clearly surjective. Thus is bijective and hence an isomorphism from G/ Ker f to Im f.

2.3.4. Generators and Orders

Definition 2.9.

Let G be a group. In this section, we assume, unless otherwise stated, that G is multiplicatively written and has identity e. Let ai, , be a family of elements of G. Consider the subset H of G defined as

with the empty product (corresponding to r = 0) being treated as e. It is easy to check that H is a subgroup of G and contains all ai, . We call H to be the subgroup generated by ai, , or that the elements ai, , generate H. H is called finitely generated, if it is generated by finitely many elements. In particular, H is called cyclic, if it is generated by a single element. If H is cyclic and generated by , then g is called a generator or a primitive element of H. Note that, in general, a cyclic subgroup has more than one generators (Exercise 2.47).

Example 2.5.
  1. The additive groups and are generated by 1 and hence are cyclic. The multiplicative group is cyclic if and only if n is 2, 4, pr or 2pr, where p is an odd prime and (See Exercise 2.50). A generator of for such an n is often called a primitive root modulo n.

  2. The group (, ·) is generated by the “primes” p/1, , and –1.

  3. Let G be a multiplicative group (not necessarily Abelian) with identity e and let . Then the subgroup H generated by a is the set of elements of the form ar, , and is always Abelian. If H is finite, then the elements ar, , cannot be all distinct, that is, as = at for some s, , s > t. Then as–t = e, where st > 0. Now a–1 = as–t–1 and, more generally, ak = ak(st–1). Thus we may consider H to consist of non-negative powers of a only. Let . It is easy to see that H = {ar | r = 0, . . . , n – 1}.

Definition 2.10.

Let G be a finite group with identity e. The order of G is defined to be the cardinality of the set G and is denoted by ord G. The order of an element is the cardinality of the subgroup of G generated by a and is denoted by ordG a or simply by ord a, when G is understood from the context.

With these notations we prove the following important proposition.

Proposition 2.4.

The order m := ordG a of is the smallest of the positive integers r for which ar = e. If n = ord G, then n is an integral multiple of m. In particular, an = e.

Proof

Let H be the (cyclic) subgroup of G generated by a. Then by Example 2.5 H = {ar | r = 0, . . . , m – 1} and m is the smallest of the positive integers r for which ar = e. By Lagrange’s theorem (Theorem 2.2), n is an integral multiple of m. That is, n = km for some . But then an = (am)k = ek = e.

Lemma 2.1.

Let G be a finite cyclic group. Then any subgroup of G is also cyclic.

Proof

Let G be generated by g and ord G = n. Then G = {gr | r = 0, . . . , n – 1}. The subgroup {e} of G is clearly cyclic. For an arbitrary subgroup H ≠ {e} of G, define . Now take any and write r = qk + δ, where q and δ are respectively the quotient and remainder of division of r by k with 0 ≤ δ < k. Then gr = (gk)qgδ and so . The minimality of k implies that δ = 0, that is, gr = (gk)q.

Proposition 2.5.

Let G be a finite cyclic multiplicative group with identity e and let H be a subgroup of order m. Then an element is an element of H if and only if am = e.

Proof

If , then am = e by Proposition 2.4. Conversely, assume that am = e, but aH. Let K be the subgroup of G generated by the elements of H and by a. By Lemma 2.1, K is cyclic. By assumption, K contains more than m elements (since H ∪ {a} ⊆ K). But every element of K has order dividing m, a contradiction.

Finite cyclic groups play a crucial role in public-key cryptography. To see how, let G be a group which is finite, cyclic with generator g and multiplicatively written. Given one can compute gr using ≤ 2 lg r + 2 group multiplications (See Algorithms 3.9 and 3.10). This means that if it is easy to multiply elements of G, then it is also easy to compute gr. On the other hand, there are certain groups for which it is very difficult to find out the integer r from the knowledge of g and gr, even when one is certain that such an integer exists. This is the basic source of security in many cryptographic protocols, like those based on finite fields, elliptic and hyperelliptic curves.

*2.3.5. Sylow’s Theorem

Sylow’s theorem is a powerful tool for studying the structure of finite groups. Recall that if G is a finite group of order n and if H is a subgroup of G of order m, then by Lagrange’s theorem m divides n. But given any divisor m′ of n, there need not exist a subgroup of G of order m′. However, for certain special values of m′, we can prove the existence of subgroups of order m′. Sylow’s theorem considers the case that m′ is a power of a prime.

Definition 2.11.

Let G be a finite group of cardinality n and let p be a prime. If n = pr for some , we call G a p-group. More generally, let p be a prime divisor of n. Then a p-subgroup of G is a subgroup H of G such that H is a p-group. If H is a p-subgroup of G with cardinality pr for some , then pr divides n. Moreover, if pr+1 does not divide n, then H is called a p-Sylow subgroup of G.

We shortly prove that p-Sylow subgroups always exist. Before doing that, we prove a simpler result.

Theorem 2.4. Cauchy’s theorem

Let G be a finite group and p a prime dividing ord G. Then G has a subgroup of order p.

Proof

Let n := ord G. Note that if we can find an element such that ord a = p, then the subgroup generated by a is the desired subgroup. To do that consider the set consisting of all p-tuples (a1, . . . , ap) with such that a1 . . . ap = e. consists of np–1 elements, since we can choose a1, . . . , ap–1 arbitrarily and independently from G and for each such choice of a1, . . . , ap–1 the value of ap = (a1 . . . ap–1)–1 gets fixed. Since p divides n, it follows that p divides too. Now we define a relation ~ on by (a1, . . . , ap) ~ (b1, . . . , bp) if and only if (b1, . . . , bp) = (ai, . . . , ap, a1, . . . , ai–1) for some (that is, (b1, . . . , bp) is a cyclic shift of (a1, . . . , ap)). It is easy to see that ~ is an equivalence relation on . The equivalence class of (a1, . . . , ap) contains 1 or p elements depending on whether a1 = · · · = ap or not. Let r and s be the the number of equivalence classes containing 1 and p elements of respectively. Then , so that p divides r. Since the equivalence class of (e, . . . , e) contains only one element, we must have r ≥ 1, that is, rp. This, in turn, proves the existence of , ae, such that . But then ap = e.

Now we are in a position to prove the general theorem.

Theorem 2.5. Sylow’s theorem

Let G be a finite group of order n and let p be a prime dividing n. Then there exists a p-Sylow subgroup of G.

Proof

We proceed by induction on n. If n = p, then G itself is a p-Sylow subgroup of G. So we assume n > p and write n = prm, where p does not divide m. If r = 1, then the theorem follows from Cauchy’s theorem (Theorem 2.4). So we assume r > 1 and consider the class equation of G, namely, (See Exercise 2.16). If p does not divide [G : C(a)] for some aZ(G), then #C(a) = #G/[G : C(a)] = prm′ < #G for some m′ < m. By induction, C(a) has a p-Sylow subgroup which is also a p-Sylow subgroup of G. On the other hand, if p divides [G : C(a)] for all aZ(G), then p divides #Z(G), as can be easily seen from the class equation. We apply Cauchy’s theorem on Z(G) to obtain a subgroup H of Z(G) with #H = p. By Exercise 2.16(b), H is a normal subgroup of G and we consider the canonical surjection μ : GG/H. Since #(G/H) = pr–1m < n and r > 1, by induction G/H has a p-Sylow subgroup, say K. But then μ–1(K) is a p-Sylow subgroup of G.

Note that if H is a p-Sylow subgroup of G and , then gHg–1 is also a p-Sylow subgroup of G. The converse is also true, that is, if H and H′ are two p-Sylow subgroups of G, then there exists a such that H′ = gHg–1. We do not prove this assertion here, but mention the following important consequence of it. If G is Abelian, then H′ = gHg–1 = gg–1H = H, that is, there is only one p-Sylow subgroup of G. If G is Abelian and with pairwise distinct primes pi and with , then G is the internal direct product of its pi-Sylow subgroups, i = 1, . . . , t (Exercises 2.17 and 2.19).

Exercise Set 2.3

2.8Let G be a multiplicatively written group (not necessarily Abelian). Prove the following assertions.
  1. For all elements a, , we have (ab)–1 = b–1a–1 and (a–1)–1 = a.

  2. A subset H of G is a subgroup of G if and only if for all a, .

2.9Let G be a multiplicatively written group and let H and K be subgroups of G. Show that:
  1. HK is a subgroup of G.

  2. HK is a subgroup of G if and only if HK or KH.

  3. HK is a subgroup of G if and only if HK = KH. In particular, if K is normal in G, then HK is a subgroup of G.

  4. G × G is a group and H × K is a subgroup of G × G.

  5. If , then gHg–1 is a subgroup of G.

2.10
  1. Let G be a multiplicatively written group and H a subgroup of G. Show that the following conditions are equivalent:

    1. H is a normal subgroup of G.

    2. for all and .

    3. gHg–1 = H for all .

    4. gH = Hg for all .

  2. Show that if [G : H] = 2, then H is normal.

2.11Let G be a (multiplicative) group.
  1. Second isomorphism theorem Let H and K be subgroups of G and let K be normal in G. Show that H/(HK) ≅ (HK)/K. [H]

  2. Third isomorphism theorem Let H and K be normal subgroups of G with HK. Show that G/K ≅ (G/H)/(K/H) (where ). [H]

2.12
  1. Show that the only automorphisms of the group (, +) are the identity map and the map that sends a ↦ –a.

  2. Show that the group of automorphisms of (, +) is isomorphic to (, ·).

2.13Let H be a subgroup of G generated by ai, . Show that H is the smallest subgroup of G, that contains all of ai, .
2.14Let be a homomorphism of (multiplicative) groups. Show that:
  1. If H is a subgroup of G, then is a subgroup of G′. If is surjective and H is normal, then H′ is also normal.

  2. If H′ is a subgroup of G′, then is a subgroup of G. If H′ is normal, then H is also normal. If is surjective and H is normal, then H′ is also normal.

  3. Correspondence theorem Let H be a normal subgroup of G. Then the subgroups (resp. normal subgroups) of G/H are in one-to-one correspondence with the subgroups (resp. normal subgroups) of G, that contain H. [H]

2.15Let G be a cyclic group. Show that G is isomorphic to or to for some depending on whether G is infinite or finite.
2.16Let G be a finite (multiplicative) group (not necessarily Abelian).
  1. We define the centre of G to be the set . Show that Z(G) is a subgroup of G.

  2. If HZ(G) is a subgroup of G, show that H is a normal subgroup of G.

  3. The centralizer of is defined to be the set . Show that C(a) is a subgroup of G. Show also that C(a) = G if and only if .

  4. Define a relation ~ on G by a ~ b if and only if b = gag–1 for some . Show that ~ is an equivalence relation on G. We say that the elements a and b of G are conjugate, if the equivalence classes [a] and [b] are the same. The equivalence classes are called the conjugacy classes of G.

  5. Show that the cardinality of the conjugacy class of is equal to the index [G : C(a)].

  6. Deduce the class equation of G, that is, #G = #Z(G) + ∑[G : C(a)], where the sum is over a set of all pairwise non-conjugate aZ(G).

2.17Let G be a (multiplicative) Abelian group with identity e and order , where pi are distinct primes and . For each i, let Hi be the pi-Sylow subgroup of G. Show that:
  1. G = H1 · · · Hr. [H]

  2. Every element can be written uniquely as g = h1 · · · hr with . Moreover, in that case we have ordG g = (ordH1 h1) · · · (ordHr hr).

  3. G is cyclic if and only if all of H1, . . . , Hr are cyclic.

2.18Let G be a finite (multiplicative) Abelian group with identity e. Assume that for every there are at most n elements x of G satisfying xn = e. Show that G is cyclic. [H]
2.19Let G be a (multiplicative) group and let H1, . . . , Hr be normal subgroups of G. If G = H1 · · · Hr and every element can be written uniquely as g = h1 · · · hr with , then G is called the internal direct product of H1, . . . , Hr. (For example, if G is finite and Abelian, then by Exercise 2.17 it is the internal direct product of its Sylow subgroups.) Show that:
  1. If G is finite, it is the internal direct product of normal subgroups H1, . . . , Hr if and only if G = H1 · · · Hr and HiHj = {e} for all i, j, ij.

  2. If G is the internal direct product of the normal subgroups H1, . . . , Hr, then G is isomorphic to the (external) direct product H1 × · · · × Hr. [H]

2.20Let Hi, i = 1, . . . , r, be finite Abelian groups of orders mi and let H := H1 × · · ·× Hr be their direct product. Show that H is cyclic if and only if each Hi is cyclic and m1, . . . , mr are pairwise coprime.

2.4. Rings

So far we have studied algebraic structures with only one operation. Now we study rings which are sets with two (compatible) binary operations. Unlike groups, these two operations are usually denoted by + and · . One can, of course, go for general notations for these operations. However, that generalization doesn’t seem to pay much, but complicates matters. We stick to the conventions.

2.4.1. Definition and Basic Properties

Definition 2.12.

A ring (R, +, ·) (or R in short) is a set R together with two binary operations + and · on R such that the following conditions are satisfied. As in the case of multiplicative groups we write ab for a · b.

  1. Additive group The set R is an Abelian group under +. The additive identity is denoted by 0.

  2. · is associative (ab)c = a(bc) for every a, b, .

  3. · is commutative ab = ba for every a, .

  4. Multiplicative identity There is an element (denoted by 1) in R such that a · 1 = 1 · a = a for every . The element 1 is called the identity of R.

  5. Distributivity The operation · is distributive over +, that is, a(b+c) = ab + ac and (a + b)c = ac + bc for every a, b, .

Notice that it is more conventional to define a ring as an algebraic structure (R, +, ·) that satisfies conditions (1), (2) and (5) only. A ring (by the conventional definition) is called a commutative ring (resp. a ring with identity), if it (additionally) satisfies condition (3) (resp. (4)). As per our definition, a ring is always a commutative ring with identity. Rings that are not commutative or that do not contain the identity element are not used in the rest of the book. So let us be happy with our unconventional definition of a ring.[3]

[3] Cool! But what’s circular in a ring? Historically, such algebraic structures were introduced by Hilbert to designate a Zahlring (a number ring, see Section 2.13). If α is an algebraic integer (Definition 2.95) and we take a Zahlring of the form and consider the powers α, α2, α3, . . . , we eventually get an αd which can be expressed as a linear combination of the previous (that is, smaller) powers of α. This is perhaps the reason that prompted Hilbert to call such structures “rings”. Also see Footnote 1.

We do not rule out the possibility that 0 = 1 in R. In that case, for any , we have a = a · 1 = a · 0 = 0 (See Proposition 2.6), that is to say, the set R consists of the single element 0. In this case, R is called the zero ring and is denoted (by an abuse of notation) by 0.

Finally, note that R is, in general, not a group under multiplication. This is because we do not expect a ring R to contain the multiplicative inverse of every element of R. Indeed the multiplicative inverse of the element 0 exists if and only if R = 0.

Example 2.6.
  1. The sets , , and are all rings under usual addition and multiplication. Each of , and contains the multiplicative inverse of every non-zero element, whereas the only elements in , that have multiplicative inverses, are ±1.

  2. Let denote the set {0, 1, . . . , n – 1} for an integer n ≥ 2. Then is a ring under addition and multiplication modulo n. The additive identity is 0 and the multiplicative identity is 1. Later we see a more formal definition of this ring. Recall from Example 2.1 how we have defined the groups and under addition and multiplication modulo n. These groups have a connection with the ring as we will shortly see.

  3. Let R be a ring and S a set. The set of all functions SR is a ring under pointwise addition and multiplication of functions (that is, if f and g are two such functions, then we define (f + g)(a) := f(a) + g(a) and (f g)(a) := f(a)g(a) for every ). The additive (resp. multiplicative) identity in this ring is the constant function 0 (resp. 1).

  4. Let R be a ring. The set R[X] of all polynomials in one indeterminate X and with coefficients from R is a ring. The identity elements in R[X] are the constant polynomials 0 and 1. The addition and multiplication operations in R[X] are the standard ones on polynomials. For a non-zero polynomial , the largest non-negative integer d for which the coefficient of Xd is non-zero is called the degree of the polynomial f and is denoted by deg f. The coefficient of Xdeg f in f is called the leading coefficient of f and is denoted by lc(f). The degree of the zero polynomial is conventionally taken to be –∞. A non-zero polynomial with leading coefficient 1 is called a monic polynomial.

    More generally, for one can define the ring R[X1, . . . , Xn] of multivariate polynomials over R. Polynomial rings are of paramount importance in algebra and number theory. We devote Section 2.6 to a study of these rings.

    We also define the ring R(X) of rational functions over R, which consists of elements of the form f/g with f, , g ≠ 0. More generally, the set of elements f/g with f, , g ≠ 0, is a ring denoted R(X1, . . . , Xn).

  5. Let Ri, , be a family of rings, and the product of the sets Ri, , that is, the set of all ordered tuples indexed by I. For tuples and , define the sum and the product . It is easy to see that R is a ring with identity elements and . It is called the direct product of the rings Ri, . If I is of finite cardinality n and if Ri = A for all , then is denoted in short by An.

Proposition 2.6.

Let R be a ring. For all a, , we have:

  1. a · 0 = 0 · a = 0

  2. a(–b) = (–a)b = –ab

  3. (–a)(–b) = ab

Proof

  1. a · 0 = a · (0 + 0) = a · 0 + a · 0, so that a · 0 = 0. Similarly, 0 · a = 0.

  2. By (1), 0 = a · 0 = a(b + (–b)) = ab + a(–b), that is, a(–b) = –ab. Similarly, (–a)b = –ab.

  3. (–a)(–b) = –(a(–b)) = –(–ab) = ab.

Definition 2.13.

Let R be a ring.

  1. An element is called a zero-divisor of R, if ab = 0 for some , b ≠ 0. By this definition, 0 is a zero-divisor of R, unless R = 0. The elements 0, 3, 5, 6, 9, 10 and 12 are all the zero-divisors of .

  2. An element is called a unit of R, if there exists an element such that ab = 1. The elements 1 and –1 are units in any ring. It is easy to see that an element cannot be simultaneously a zero-divisor and a unit. The set of all units in a ring R is denoted by R* and is a group under the multiplication of the ring R (See Exercise 2.21), called the multiplicative group or the group of units of R. The multiplicative group of the ring (Example 2.6) is .

  3. An element is called nilpotent, if ak = 0 for some . By this definition, 0 is a nilpotent element in any ring. It is also evident that every nilpotent element in a non-zero ring is a zero-divisor. An example of a non-zero nilpotent element in a ring is .

  4. An element is called idempotent, if a2 = a. In every ring, 0 and 1 are idempotent. The element 6 is idempotent in . It is easy to check that 0 is the only element in a ring, that is both nilpotent and idempotent.

Definition 2.14.

Let R be a ring.

  1. R is called an integral domain (or simply a domain), if R ≠ 0 and if R contains no non-zero zero-divisors. Examples of integral domains: , , , , . On the other hand, 3 · 5 = 0 in , so Z15 is not an integral domain.

  2. R is called a field, if R ≠ 0 and if R* = R \ {0}, that is, if every non-zero element of R is a unit. This means that in a field one can divide any element by any non-zero element. The most common fields are , and . Note that is not a field, since, for example, 2 does not have a multiplicative inverse in .

  3. A field R with #R finite is called a finite field. The simplest examples of finite fields are the fields for prime integers p. In fact, it is easy to see that is a field if and only if n is a prime. Finite fields are widely applied for building various cryptographic protocols. See Section 2.9 for a detailed study of finite fields.

Corollary 2.1.

A field is an integral domain.

Proof

Recall from Definition 2.13 that an element in a ring cannot be simultaneously a unit and a zero-divisor.

Definition 2.15.

Let R be a non-zero ring. The characteristic of R, denoted char R, is the smallest positive integer n such that 1 + 1 + · · · + 1 (n times) = 0. If no such integer exists, then we take char R = 0.

, , and are rings of characteristic zero. If R is a non-zero finite ring, then the elements 1, 1 + 1, 1 + 1 + 1, · · · cannot be all distinct. This shows that there are positive integers m and n, m < n, such that 1+1+· · · + 1 (n times) = 1 + 1 + · · · + 1 (m times). But then 1 + 1 + · · · + 1 (nm times) = 0. Thus any non-zero finite ring has positive (that is, non-zero) characteristic. If char R = t is finite, then for any one has .

In what follows, we will often denote by n the element 1 + 1 + · · · + 1 (n times) of any ring. One should not confuse this with the integer n. One can similarly identify a negative integer –n with the ring element –(1 + 1 + · · · + 1)(n times) = (–1) + (–1) + · · · + (–1)(n times).

Proposition 2.7.

Let R be an integral domain of positive characteristic p. Then p is a prime.

Proof

If p is composite, then we can write p = mn with 1 < m < p and 1 < n < p. But then p = mn = 0 (in R). Since R is an integral domain, we must have m = 0 or n = 0 (in R). This contradicts the minimality of p.

2.4.2. Subrings, Ideals and Quotient Rings

Just as we studied subgroups of groups, it is now time to study subrings of rings. It, however, turns out that subrings are not that important for the study of rings as the subsets called ideals are. In fact, ideals (and not subrings) help us construct quotient rings. This does not mean that ideals are “normal” subrings! In fact, ideals are, in general, not subrings at all, and conversely. The formal definitions are waiting!

Definition 2.16.

Let R be a ring. A subset S of R is called a subring of R, if S is a ring under the ring operations of R. In this case, one calls R a superring or a ring extension of S.

If R and S are both fields, then S is often called a subfield of R and R a field extension (or simply an extension) of S. In that case, one also says that SR is a field extension or that R is an extension over S.

is a subring of , and , whereas and are field extensions.

We demand that a ring always contains the multiplicative identity (Definition 2.12). This implies that if S is a subring of R, then for all integers n, the elements are also in S (though they need not be pairwise distinct). Similarly, if R and S are fields, then S contains all the elements of the form mn–1 for m, , (cf. Exercise 2.26). Thus , the set of all even integers, is not a subring of , though it is a subgroup of (, +) (Example 2.2).

Definition 2.17.

Let R be a ring. A subset of R is called an ideal of R, if is an additive subgroup of (R, +) and if for all and .[4]

[4] Kummer introduced the concept of ideal numbers. Later Dedekind reformulated Kummer’s notion of ideal numbers to define what we now know as ideals.

In this book, we will use Gothic letters (usually lower case) like , , , , to denote ideals.[5]

[5] Mathematicians always run out of symbols. Many believe if it is Gothic, it is just ideal!

The condition for being an ideal is in one sense more stringent than that for being a subring, that is, an ideal has to be closed under multiplication by any element of the entire ring. On the other hand, we do not demand an ideal to necessarily contain the identity element 1. In fact, is an ideal of . Conversely, is a subring of but not an ideal. Subrings and ideals are different things.

Example 2.7.
  1. Let R be any ring. The subset {0} is an ideal of R, called the zero ideal and denoted also by 0. Similarly, the entire ring R is an ideal of R and is called the unit ideal. Note that if an ideal contains a unit u of R, then 1 = u–1u is also in and so for every . It follows that an ideal of R is the unit ideal if and only if contains a unit—a justification for the name.

  2. The integral multiples of an integer n form an ideal of denoted by . More generally, for any ring R and for any , the set is an ideal of R and is denoted by Ra or aR or 〈a〉. Such an ideal is called a principal ideal. (See also Definition 2.18.)

  3. Let R be a ring and let , , be a family of ideals of R. The intersection is an ideal of R. The set of finite sums the form (where and ) is an ideal of R. It is called the sum of the ideals , , and is denoted by . The union is, in general, not an ideal of R. In fact, the sum is the smallest ideal that contains (the set) .

Proposition 2.8.

The only ideals of a field are the zero ideal and the unit ideal.

Proof

By definition, every non-zero element of a field is a unit.

Definition 2.18.

Let R be a ring and ai, , a family of elements of R. The ideal generated by ai, , is defined to be the sum of the principal ideals Rai. We denote this as . In this case, we also say that is generated by ai, . If I is finite, then we say that is finitely generated. In particular, if #I = 1, then is a principal ideal (See Example 2.7).

An integral domain every ideal of which is principal is called a principal ideal domain or PID in short. A ring every ideal of which is finitely generated is called Noetherian. Thus principal ideal domains are Noetherian.

Note that an ideal may have different generating sets of varying cardinalities. For example, the unit ideal in any ring is principal, since it is generated by 1. The integers 2 and 3 generate the unit ideal of , since . However, neither 2 nor 3 individually generates the unit ideal of . Indeed, using Bézout’s relation (Proposition 2.16) one can show that for every there is a (minimal) generating set of the unit ideal of , that contains exactly n integers. Interested readers may try to construct such generating sets as an (easy) exercise.

Theorem 2.6.

is a principal ideal domain.

Proof

The zero ideal is generated by 0. Let be a non-zero ideal of and let a be the smallest positive integer contained in . We claim that . Clearly, . For the converse, take . We can write b = aq + r, where q and r are the quotient and the remainder of (Euclidean) division of b by a. Now and since 0 ≤ r < a, by the choice of a we must have r = 0, so that .

A very similar argument proves the following theorem. The details are left to the reader. Also see Exercise 2.31.

Theorem 2.7.

If K is a field, then K[X] is a principal ideal domain.

We now prove a very important theorem:

Theorem 2.8. Hilbert’s basis theorem

If R is a Noetherian ring, then so is the polynomial ring R[X1, . . . , Xn] for . In particular, the polynomial rings and K[X1, . . . , Xn] are Noetherian, where K is a field.

Proof

Using induction on n we can reduce to the case n = 1. So we prove that if R is Noetherian, then R[X] is also Noetherian. Let be a non-zero ideal of R[X]. Assume that is not finitely generated. Then we can inductively choose non-zero polynomials f1, f2, f3, · · · from such that for each the polynomial fi is one having the smallest degree in . Let di := deg fi. Then d1d2d3 ≤ · · ·. Let ai denote the leading coefficient of fi. Consider the ideal in R. By hypothesis, is finitely generated, say, . This, in particular, implies that for some . But then the polynomial belongs to , is non-zero and has degree < dr+1, a contradiction to the choice of fr+1. Thus must be finitely generated.

Two particular types of ideals are very important in algebra.

Definition 2.19.

Let R be a ring.

  1. An ideal of R is called a prime ideal, if and if implies or for a, . The second condition is equivalent to saying that if and , then the product . For a prime integer p, the principal ideal of is prime. On the other hand, for a composite integer n the ideal of is not prime. For example, and , but the product .

  2. An ideal of R is called a maximal ideal, if and if for any ideal satisfying we have or . This means that there are no non-unit ideals of R properly containing . All the ideals of for prime integers p are maximal ideals (Corollary 2.3). Next consider the polynomial ring and the principal ideal 〈X〉 of R. It is easy to see that 〈X〉  〈X, 2〉  R. Thus 〈X〉 is not maximal.

Prime and maximal ideals can be characterized by some nice equivalent criteria. See Proposition 2.9.

Definition 2.20.

Let R be a ring and an ideal of R. Then is a subgroup of the group (R, +). Since (R, +) is Abelian, is a normal subgroup (Definition 2.6). Thus the cosets , , form an additive Abelian group. We define multiplication on these cosets as . It is easy to check that this multiplication is well-defined. Furthermore, the set of these cosets, denoted , becomes a ring under this addition and multiplication. The ring is called the quotient ring of R with respect to .

We say that two elements a, are congruent modulo an ideal (of R) and write ab (mod ), if . Thus ab (mod ) if and only if a and b lie in the same coset of , that is, .

Example 2.8.
  1. For any ring R, the quotient ring R/0 is essentially the same as R and the quotient ring R/R is the zero ring.

  2. The ring of Example 2.6 is formally defined to be the quotient ring . Convince yourself that both these definitions are equivalent.

Proposition 2.9.

Let R be a ring and an ideal of R.

  1. is a prime ideal of R if and only if is an integral domain.

  2. is a maximal ideal of R if and only if is a field.

Proof

  1. Let a, be arbitrary. Then is prime ⇔ implies or implies or is an integral domain.

  2. Let be a maximal ideal. Choose . Then . Consider the ideal . Since is maximal, we must have . This means that a + cb = 1 for some and . Then which implies that is a unit in . That is, is a field.

    Conversely, let be a field. Consider any ideal of R with . Choose any . Then . By hypothesis, there exists such that , that is, . Hence , that is, .

The last proposition in conjunction with Corollary 2.1 indicates:

Corollary 2.2.

Maximal ideals are prime.

Corollary 2.3.

For every , the quotient ring is a field. In particular, is a maximal ideal of .

Proof

Since is a prime ideal of , is an integral domain. But is finite, so by Exercise 2.25 is a field.

2.4.3. Homomorphisms

Recall how we have defined homomorphisms of groups. In a similar manner, we define homomorphisms of rings. A ring homomorphism is a map from one ring to another, which respects addition, multiplication and the identity element. More precisely:

Definition 2.21.

Let R and S be rings. A map f : RS is called a (ring)homomorphism, if f(a+b) = f(a) + f(b) and f(ab) = f(a)f(b) for all a, and if f(1) = 1. A homomorphism f : RS is called an isomorphism, if there exists a homomorphism g : SR such that g ο f = idR and f ο g = idS. As in the case of groups, bijectivity of f as a function is both necessary and sufficient for a homomorphism f : RS to be an isomorphism. If f : RS is an isomorphism, we write RS and say that R is isomorphic to S or that R and S are isomorphic.

A homomorphism f : RR is called an endomorphism of R. An automorphism is a bijective endomorphism.

Example 2.9.
  1. For any ring extension RS, the canonical inclusion aa is a homomorphism from RS. In particular, the identity map on any ring is an automorphism.

  2. Let R be a ring and an ideal of R. The canonical surjection that takes is a ring homomorphism.

  3. Let R be a ring and let . The map R[X] → R that takes f(X) ↦ f(a) is a ring homomorphism and is called the substitution homomorphism.

  4. The map taking n ↦ –n is not a ring homomorphism, since it maps 1 to –1 (and does not satisfy f(ab) = f(a)f(b) for all a, ).

  5. The map that maps z = a + ib to its conjugate is an automorphism of the field .

Proposition 2.10.

Let f : RS be a ring homomorphism.

  1. If is a unit, then f(a) is a unit in S and f(a–1) = (f(a))–1.

  2. Let be an ideal in S. Then is an ideal in R. If is prime, then is also prime.

Proof

  1. If ab = 1, then f(a)f(b) = f(ab) = f(1) = 1.

  2. For , a, and b, with f(a) = b and f(a′) = b′, we have and . Thus is an ideal of R. If , then . If is prime (in which case and are proper ideals of R and S respectively), then or . But then or .

The ideal of the above proposition is called the contraction of and is often denoted by . If RS and f is the inclusion homomorphism, then .

Definition 2.22.

Let f : RS be a ring homomorphism. The set is called the kernel of f and is denoted by Ker f. The set is called the image of f and is denoted by f(R) or Im f.

Theorem 2.9. Isomorphism theorem

With the notations of the last definition, Ker f is an ideal of R, Im f is a subring of S and R/ Ker f ≅ Im f.

Proof

Consider the map that takes a + Ker ff(a). It is easy to verify that is a well-defined ring homomorphism and is bijective. The details are left to the reader. Also see Theorem 2.3.

Definition 2.23.

Two ideals and of a ring R are called relatively prime or coprime if , that is, if there exist and with a + b = 1.

Theorem 2.10. Chinese remainder theorem (CRT)

Let R be a ring and . Let be ideals in R such that for all i, j, ij, the ideals and are relatively prime. Then is isomorphic to the direct product .

Proof

The assertion is obvious for n = 1. So assume that n ≥ 2 and define the map by for all . Since for all i, the map is well-defined. It is easy to see that is a ring homomorphism. In order to show that is injective, we let . This means that , that is, for all i. Then , that is, . The trickier part is to prove that is surjective. Let . Let us consider the ideal for each i. For a given i, there exist for each ji elements and with αj + βj = 1. Multiplying these equations shows that we have a such that γi + δi = 1, where . (This shows that for all i.) Now consider the element . It follows that for all i, that is, .

In Section 2.5, we will see an interesting application of this theorem. Notice that the injectivity of in the last proof does not require the coprimality of ; the surjectivity of requires this condition.

2.4.4. Factorization in Rings

Now we introduce the concept of divisibility in a ring. We also discuss about an important type of rings known as unique factorization domains. This study is a natural generalization of that of the rings and K[X], K a field.

Definition 2.24.

Let R be a ring, a, and . Also let K be a field.

  1. We say that a divides b and write a|b, if there exists an element such that b = ac. If a does not divide b, we write ab. In , for example, –31|899, since 899 = (–31) · (–29). By this definition, any element divides 0, whereas 0 divides no element other than 0.

  2. It is easy to see that a|b and b|a if and only if b = ca for some unit . In that case, we say that a and b are associates of each other. The relation of being associate is an equivalence relation on R (or R \ {0}), as can be easily verified. The only associates of , a ≠ 0, are ±a, since ±1 are the only units in . Two non-zero polynomials f and g of K[X] are associates if and only if f = αg for some .

  3. A non-zero non-unit is called a prime, if p|ab implies either p|a or p|b. One can check easily that p is prime if and only if the principal ideal 〈p〉 = pR is a prime ideal.

  4. A non-zero non-unit is called irreducible, if p = ab implies either a or b is a unit.

Note that for the concepts of prime and irreducible elements are the same. This is indeed true for any PID (Proposition 2.12). Thus our conventional definition of a prime integer p > 0 as one which has only 1 and p as (positive) divisors tallies with the definition of irreducible elements above. For the ring K[X], on the other hand, it is more customary to talk about irreducible polynomials instead of prime polynomials; they are the same thing anyway.

Proposition 2.11.

Let R be an integral domain and a prime. Then p is irreducible.

Proof

Let p = ab. Then p|(ab), so that by hypothesis p|a or p|b. If p|a, then a = up for some . Hence p = ab = upb, that is, (1 – ub)p = 0. Since R is an integral domain and p ≠ 0, we have 1 – ub = 0, that is, ub = 1, that is, b is a unit. Similarly, p|b implies a is a unit.

Proposition 2.12.

Let R be a PID. An element is prime if and only if p is irreducible.

Proof

[if] Let p be irreducible, but not prime. Then there are a, such that a ∉ 〈p〉 and b ∉ 〈p〉, but . Consider the ideal . Since , we have p = cα for some . By hypothesis, p is irreducible, so that either c or α is a unit. If c is a unit, 〈p〉 = 〈α〉 = 〈p〉 + 〈a〉, that is, , a contradiction. So α is a unit. Then 〈p〉 + 〈a〉 = R which implies that there are elements u, such that up + va = 1. Similarly, there are elements u′, such that up + vb = 1. Multiplying these two equations gives (uup + uvb + uva)p + (vv′)ab = 1. Now , so that ab = wp for some . But then (uup + uvb + uva + vvw)p = 1, which shows that p is a unit, a contradiction.

[only if] Immediate from Proposition 2.11.

Definition 2.25.

An integral domain R is called a unique factorization domain or a UFD in short, if every non-zero element can be written as a product a = up1 · · · pr, where , and p1, . . . , pr are prime elements (not necessarily distinct) of R. Moreover, such a factorization is unique up to permutation of the primes p1, . . . , pr and up to multiplication of the primes by units. This factorization can also be written as , where , , q1, . . . , qs are pairwise non-associate primes and αi > 0 for i = 1, . . . , s. Some authors also use the term factorial ring or factorial domain in order to describe a UFD.

If is a prime and , a ≠ 0, then the multiplicity of p in a is the nonnegative integer v such that pv|a, but pv+1a. This integer v is denoted by vp(a). It is clear form the definition that for every , a ≠ 0, there exist only finitely many non-associate primes p for which vp(a) > 0.

Proposition 2.13.

Let R be a UFD. An element is prime if and only if p is irreducible.

Proof

The only if part is immediate from Proposition 2.11. For proving the if part, let p = up1 · · · pr ( and pi primes in R) be irreducible. If r = 0, p is a unit, a contradiction. If r > 1, then p can be written as the product of two non-units up1 · · · pr–1 and pr, again a contradiction. So r = 1.

A classical example of an integral domain that is not a UFD is . In this ring, we have two essentially different factorizations of 6 into irreducible elements. The failure of irreducible elements to be primes in such rings is a serious thing to patch up!

Theorem 2.11.

A PID is a UFD

Proof

Let R be a PID and . We show that a has a factorization of the form a = up1 · · · pr, where u is a unit and p1, . . . , pr are prime elements of R. If a is a unit, we are done. So assume that a =: a0 is a non-unit and let . Since , there is a maximal ideal containing (Exercise 2.23). Then p1 is a prime that divides a0. Let a0 = a1p1. We have . If is the unit ideal, we are done. Otherwise we choose as before a prime p2 dividing a1 and with a1 = a2p2 get the ideal properly containing . Repeating this process we can generate a strictly ascending chain of ideals of R. Since R is a PID and hence Noetherian, this process must stop after finitely many steps (Exercise 2.33).

The converse of the above theorem is not necessarily true. For example, the polynomial ring K[X1, . . . , Xn] over a field K is a UFD for every , but not a PID for n ≥ 2.

Divisibility in a UFD can be rephrased in terms of prime factorizations. Let R be a UFD and let the non-zero elements a, have the prime factorizations and with units u, u′, pairwise non-associate primes p1, . . . , pr and with αi ≥ 0 and βi ≥ 0. Then a|b if and only if αi ≤ βi for all i = 1, . . . , r. This notion leads to the following definitions.

Definition 2.26.

Let R be a UFD and let a, have prime factorizations as in the last paragraph. Any associate of , is called a greatest common divisor of a and b and is denoted by gcd(a, b). Clearly, gcd(a, b) is unique up to multiplication by units of R. Similarly, any associate of , is called a least common multiple of a and b and is denoted by lcm(a, b). lcm(a, b) is again unique up to multiplication by units of R. The gcd of a ≠ 0 and 0 is taken to be an associate of a, whereas gcd(0, 0) is undefined. On the other hand, lcm(a, 0) is defined to be 0 for any .

It is clear that these definitions of gcd and lcm can be readily generalized for any arbitrary finite number of elements.

Corollary 2.4.

Let R be a UFD and a, not both zero. Then gcd(a, b) · lcm(a, b) is an associate of ab.

Proof

Immediate from the definitions.

Corollary 2.5.

Let R be a UFD and a, b, with a|bc. If gcd(a, c) = 1, then a|b.

Proof

Consider the prime factorizations of a, b and c.

For a PID, the gcd and lcm have equivalent characterizations.

Proposition 2.14.

Let R be a PID and a, b be non-zero elements of R. Let d be a gcd of a and b. Then 〈d〉 = 〈a〉 + 〈b〉. If f is an lcm of a and b, then 〈f〉 = 〈a〉 ∩ 〈b〉.

Proof

Let 〈a〉 + 〈b〉 = 〈c〉. We show that c and d are associates. There exist u, such that ua + vb = c. Since d|a and d|b, we have d|c. On the other hand, , so that c|a. Similarly c|b. Considering the prime factorizations of a and b one can then readily verify that c|d. The proof for the second part is similar and is left to the reader.

A direct corollary to the last proposition is the following.

Corollary 2.6.

Let R be a PID, a, (not both zero) and d a gcd of a and b. Then there are elements u, such that ua + vb = d. In particular, the ideals 〈a〉 and 〈b〉 are relatively prime if and only if gcd(a, b) is a unit. In that case, we also say that the elements a and b are relatively prime or coprime.

This completes our short survey of factorization in rings. Note that and K[X] (for a field K) are PID and hence UFD. Thus all the results we have proved in this section apply equally well to both these rings. It is because of this (and not of a mere coincidence) that these two rings enjoy many common properties. Thus our abstract treatment saves us from the duplicate effort of proving the same results once for integers (Section 2.5) and once more for polynomials (Section 2.6).

Exercise Set 2.4

2.21For a non-zero ring R, prove the following assertions:
  1. A unit of R is not a zero-divisor.

  2. The product of two units of R is again a unit.

  3. The product of two non-units of R is again a non-unit.

  4. The element 0 is not a unit in R.

  5. The element 1 is always a unit in R.

  6. If a is a unit and ab = ac, then b = c.

Let K be a field. What are the units in the polynomial ring K[X]? In K[X1, . . . , Xn]? In the ring K(X) of rational functions? In K(X1, . . . , Xn)?

2.22

Binomial theorem Let R be a ring, a, and . Show that

where

are the binomial coefficients.

2.23Show that every non-zero ring has a maximal (and hence prime) ideal. More generally, show that every non-unit ideal of a non-zero ring is contained in a maximal ideal. [H]
2.24Let R be a ring.
  1. Show that the set of all nilpotent elements of R is an ideal of R. This ideal is usually denoted by and is called the nilradical of R.

  2. Show that the quotient ring has no non-zero nilpotent elements. (The ring is called the reduction of R and is often written as Rred. If , then we say that R is reduced. Thus is always reduced.)

  3. Show that the nilradical of R is the intersection of the prime ideals of R. [H]

2.25Show that a finite integral domain R is a field. [H]
2.26Let R be a ring of characteristic 0. Show that:
  1. R contains infinitely many elements.

  2. If R is an integral domain, then R contains as subring an isomorphic copy of .

  3. If R is a field, then R contains as subfield an isomorphic copy of .

2.27Let f : RS be a ring-homomorphism and let and be ideals in R and S respectively. Find examples to corroborate the following statements.
  1. Let be such that f(a) is a unit in S. Then a need not be a unit in R.

  2. The set need not be an ideal of S.

  3. If and if is maximal, then need not be maximal.

2.28Let K be a field.
  1. Show that a homomorphism from K to any non-zero ring is injective.

  2. Let L be another field and let f : KL and g : LK be homomorphisms such that g ο f = idK. Show that f and g are isomorphisms.

2.29
  1. Show that a ring R is an integral domain if and only if 0 is a prime ideal of R.

  2. Give an example of a reduced ring that is not an integral domain. (Note that an integral domain is always reduced.)

2.30Let R be a ring and let and be ideals of R with . Show that is an ideal of and that . [H]
2.31An integral domain R is called a Euclidean domain (ED) if there is a map satisfying the following two conditions:
  1. ν(a) ≤ ν(ab) for all a, .

  2. For every a, with b ≠ 0, there exist (not necessarily unique) q, such that a = qb + r with r = 0 or ν(r) < ν(b).

Show that:

  1. is a Euclidean domain with ν(a) = |a| for a ≠ 0.

  2. The polynomial ring K[X] over a field K is a Euclidean domain with ν(a) = deg a for a ≠ 0.

  3. For d = –2, –1, 2, 3, the ring

    is a Euclidean domain with , a, , not both 0.

  4. A Euclidean domain is a PID (and hence a UFD).

2.32Let R be a ring and an ideal. Consider the set

Show that is an ideal of R. It is called the radical or root of . If , then is called a radical or a root ideal. For arbitrary ideals and of R, prove the following assertions.

  1. .

  2. .

  3. If , then .

  4. If is a prime ideal, then .

  5. if and only if .

  6. .

  7. .

  8. The nilradical .

2.33Let R be a ring. An ascending chain of ideals is a sequence . The ascending chain is called stationary, if there is some such that for all nn0. Show that the following conditions are equivalent. [H]
  1. R is Noetherian (that is, every ideal of R is finitely generated).

  2. Every ascending chain of ideals in R is stationary.

  3. Every non-empty set of ideals of R has a maximal element.

2.34
  1. Let R be an integral domain. Define the set . Define a relation ~ on S as (a, b) ~ (c, d) if and only if ad = bc. Show that ~ is an equivalence relation on S. Let us denote the equivalence class of by a/b and the set of all equivalence classes of S under ~ by K.

  2. Now define (a/b)+(c/d) := (ad+bc)/(bd) and (a/b)·(c/d) := (ac)/(bd). Show that these definitions make K a field. This field is called the quotient field of R and is denoted as Q(R). This process resembles the formation of rational numbers from the integers. Indeed, .

2.5. Integers

The set of integers is the main object of study in this section. We use many results from previous sections to derive properties of integers. Recall that is a PID and hence a UFD.

2.5.1. Divisibility

The notions of divisibility, prime and relatively prime integers, gcd and lcm of integers are essentially the same as discussed in connection with a PID or a UFD. We avoid repeating the definitions here, but concentrate on other useful properties of integers, not covered so far. We only mention that whenever we talk about a prime integer, or the gcd or lcm of two or more integers, we will usually refer to a non-negative integer. This convention makes primes, gcds and lcms unique.

Theorem 2.12.

There are infinitely many prime integers.

Proof

Let be arbitrary and let p1, p2, . . . , pn be n distinct primes. The (non-zero non-unit) integer q := p1p2 · · · pn + 1 is divisible by neither of p1, . . . , pn and hence must have a prime divisor pn+1 different from p1, . . . , pn. The result then follows by induction on n (and the fact that the set of primes is non-empty).

Theorem 2.13.

For an integer a and an integer b ≠ 0, there exist unique integers q and r such that a = qb + r with 0 ≤ r < |b|.

Proof

Call the smallest non-negative element in the set to be r and the corresponding value of c to be q. Then these integers q and r satisfy the desired properties. To prove the uniqueness let a = q1b + r1 = q2b + r2, where 0 ≤ r1 < |b| and 0 ≤ r2 < |b|. But then (q2q1)b = r1r2 with –|b| < r1r2 < |b|. Since b|(r1r2), we must then have r1r2 = 0, that is, r1 = r2, which, in turn, implies that q1 = q2.

The integers q and r in the above theorem are respectively called the quotient and the remainder of Euclidean division of a by b and are denoted respectively by a quot b and a rem b. Do not confuse Euclidean division with the division (that is, the inverse of multiplication) of the ring . Euclidean division is the basis of the Euclidean gcd algorithm. More specifically:

Proposition 2.15.

For integers a, b with b ≠ 0, let r be the remainder of Euclidean division of a by b. Then gcd(a, b) = gcd(b, r).

Proof

Clearly, 〈a〉 + 〈b〉 = 〈r〉 + 〈b〉. Now use Proposition 2.14.

Proposition 2.16.

Let a and b be two integers, not both zero, and let d be the (positive) gcd of a and b. Then there are integers u and v such that d = ua + vb. (Such an equality is called a Bézout relation.) Furthermore, if a and b are both non-zero and (|a|, |b|) ≠ (1, 1), then u and v can be so chosen that |u| < |b| and |v| < |a|.

Proof

The existence of u and v follows immediately from Proposition 2.14. If a = qb, then u = 0 and v = 1 is a suitable choice. So assume that ab and ba, in which case d < |a| and d < |b|. We may assume, without loss of generality, that a and b are positive. First note that if (u, v) satisfies the Bézout relation, then for any the pair (u + kb, vka) also satisfies the same relation. So we may replace v by its remainder of Euclidean division by a and may assume |v| < a. But then |ua| – b < |ua| – d ≤ |uad| = |vb| ≤ (a – 1)b, which implies |u| < b.

The notions of the gcd and of the Bézout relation can be generalized to any finite number of integers a1, . . . , an as

gcd(a1, . . . , an) = gcd(· · · (gcd(gcd(a1, a2), a3) · · ·), an) = u1a1 + · · · + unan

for some integers u1, . . . , un (provided that all the gcds mentioned are defined).

2.5.2. Congruences

Since is a PID, congruence modulo a non-zero ideal of can be rephrased in terms of congruence modulo a positive integer as follows.

Definition 2.27.

Let . Two integers a and b are said to be congruent modulo n, denoted ab (mod n), if n|(ab), that is, if the remainders of Euclidean divisions of a and b by n are the same. In terms of ideals, this is the same as ab (mod 〈n〉) (See Definition 2.20). Congruence is an equivalence relation on , the equivalence classes being the cosets of the ideal of .

By an abuse of notation, we often denote the equivalence class [a] of simply by a. The following are some basic properties of congruent integers.

Proposition 2.17.

Let , ab (mod n) and cd (mod n). Then:

  1. a ± cb ± d (mod n).

  2. acbd (mod n).

  3. For any polynomial , we have f(a) ≡ f(b) (mod n).

  4. If n′|n, then ab (mod n′).

  5. If m|a and m|b, then a/mb/m (mod n/ gcd(n, m)).

Proof

(1) and (2) follow from the consideration of the quotient ring . (3) follows from repeated applications of (1) and (2). For the proof of (4), consider ab = kn and n = kn′ for k, . For proving (5), take ab = kn = lm. Then m/ gcd(n, m) divides k(n/ gcd(n, m)). Since m/ gcd(n, m) and n/ gcd(n, m) are coprime, by Corollary 2.5 l′ := k/(m/ gcd(n, m)) is an integer and we have a/mb/m = l = kn/m = l′(n/ gcd(n, m)).

Let with gcd(ni, nj) = 1 for ij. Then lcm(n1, . . . , nr) = n1 · · · nr, and by the Chinese remainder theorem (Theorem 2.10), we have

This implies that, given integers a1, . . . , ar, there exists an integer x unique modulo n1 · · · nr such that x satisfies the following congruences simultaneously:

xa1 (mod n1)
xa2 (mod n2)
  
xar (mod nr)

We now give a procedure for constructing the integer x explicitly. Define N := n1 · · · nr and Ni := N/ni for 1 ≤ ir. Then for each i we have gcd(ni, Ni) = 1 and, therefore, there are integers ui and vi with uini + viNi = 1. Then (mod N) is the desired solution.

Let . We now study the multiplicative group of the ring . We say that an integer has a multiplicative inverse modulo n, if , or, equivalently, if there is an integer b with ab ≡ 1 (mod n). The following proposition is an important characterization of the elements of .

Proposition 2.18.

(The equivalence class of) an integer a belongs to if and only if gcd(a, n) = 1.

Proof

[if] By Proposition 2.16, there exist integers u and v such that ua + vn = 1. But then ua ≡ 1 (mod n).

[only if] For some integers u and v, we have ua + vn = 1, which implies that the gcd of a and n divides 1 and hence is equal to 1.

Definition 2.28.

The cardinality of is denoted by φ(n). By Proposition 2.18, φ(n) is equal to the number of integers between 0 and n – 1 (both inclusive), which are relatively prime to n. The function is called Euler’s totient function. For example, for a prime p we have , so φ(p) = p – 1.

The following two theorems are immediate consequences of Proposition 2.4.

Theorem 2.14. Euler’s theorem

Let and with gcd(a, n) = 1. Then

aφ(n) ≡ 1 (mod n).

Theorem 2.15. Fermat’s little theorem

Let p be a prime and with gcd(a, p) = 1. Then

ap–1 ≡ 1 (mod p).

For any integer , one has bpb (mod p).

Theorem 2.16. Wilson’s theorem

For every prime p, we have (p – 1)! ≡ –1 (mod p).

Proof

The result holds for p = 2. So assume that p is an odd prime. Since is a field, Fermat’s little theorem gives the factorization

Equation 2.1


Looking at the constant terms in two sides proves Wilson’s theorem.

The structure of the group , , can be easily deduced from Fermat’s little theorem. This gives us the following important result.

Proposition 2.19.

For a prime p, the group is cyclic.

Proof

For every divisor d of p –1, we have Xp–1–1 = (Xd–1)f(X) for some with deg f = p – 1 – d. By Congruence 2.1, Xp–1 – 1 has p – 1 roots modulo p. Since is a field, f(X) (mod p) cannot have more than p – 1 – d roots (Proposition 2.25) and it follows that Xd–1 has exactly d roots modulo p. In particular, if d = qe for some and , then there exist exactly qe elements of of orders dividing qe and exactly qe–1 elements of of orders dividing qe–1, that is, there are qeqe–1 > 0 elements of of order qe. If is the canonical prime factorization of p – 1 (with each ei ≥ 1), by the above argument there exists an element of order for each i = 1, . . . , r. It is now easy to check that has order .

Euler’s totient function plays an extremely important role in number theory (and cryptology). We now describe a method for computing it.

Lemma 2.2.

If n and n′ are relatively prime positive integers, then φ(nn′) = φ(n)φ(n′).

Proof

If a is invertible modulo nn′, then clearly it is invertible modulo both n and n′. Conversely, if ua ≡ 1 (mod n) and ua′ ≡ 1 (mod n′), then by the Chinese remainder theorem there are integers x and α, unique modulo nn′, satisfying xu (mod n), xu′ (mod n′), α ≡ a (mod n) and α ≡ a′ (mod n′). But then xα ≡ 1 (mod nn′). Therefore, , whence the lemma follows.

Lemma 2.3.

If p is a prime and , then φ(pe) = pepe–1 = pe(1 – 1/p).

Proof

Integers between 0 and pe – 1, which are relatively prime to pe are precisely those that are not multiples of p.

Proposition 2.20.

Let be the prime factorization of a positive integer n with , with pairwise distinct primes p1, . . . , pr and with ei > 0. Then

Proof

Immediate from Lemmas 2.2 and 2.3.

By Proposition 2.18, the linear congruence ax ≡ 1 (mod n) is solvable for x if and only if gcd(a, n) = 1. In such a case, the solution is unique modulo n. Now, let us concentrate on the solutions of the general linear congruence:

axb (mod n).

Theorem 2.17 characterizes the solutions of this congruence.

Theorem 2.17.

Let d := gcd(a, n). Then the congruence axb (mod n) is solvable for x if and only if d|b. A solution of the congruence, if existent, is unique modulo n/d.

Proof

[if] By Proposition 2.17, (a/d)xb/d (mod n/d). Since gcd(a/d, n/d) = 1, the congruence (a/d)x′ ≡ 1 (mod n/d) is solvable for x′. Then a solution for x is x ≡ (b/d)x′ (mod n/d).

[only if] There exists an integer k such that ax + kn = b. This shows that d|b.

To prove the uniqueness let x and x′ be two integers satisfying the given congruence. But then a(xx′) ≡ 0 (mod n), that is, (a/d)(xx′) ≡ 0 (mod n/d), that is, xx′ ≡ 0 (mod n/d), since gcd(a/d, n/d) = 1.

The last theorem implies that if d|b, then the congruence axb (mod n) has d solutions modulo n. These solutions are given by ξ + r(n/d), r = 0, . . . , d – 1, where ξ is the solution modulo n/d of the congruence (a/d)ξ ≡ b/d (mod n/d).

2.5.3. Quadratic Residues

In this section, we consider quadratic congruences, that is, congruences of the form ax2+bx+c ≡ 0 (mod n). We start with the simple case . We assume further that p is odd, so that 2 has a multiplicative inverse mod p. Since we are considering quadratic equations, we are interested only in those integers a for which gcd(a, p) = 1. In that case, a also has a multiplicative inverse mod p and the above congruence can be written as y2 ≡ α (mod p), where yx + b(2a)–1 (mod p) and α ≡ b2(4a2)–1c(a–1) (mod p). This motivates us to provide Definition 2.29.

Definition 2.29.

Let p be an odd prime and a an integer with gcd(a, p) = 1. We say that a is a quadratic residue modulo p, if the congruence x2a (mod p) has a solution (for x). Otherwise we say that a is a quadratic non-residue modulo p.

If a is a quadratic residue modulo an odd prime p, then the equation x2a (mod p) has exactly two solutions. If ξ is one solution, the other solution is p – ξ. It is, therefore, evident that there are exactly (p – 1)/2 quadratic residues and exactly (p – 1)/2 quadratic non-residues modulo p. For example, the quadratic residues modulo p = 11 are 1 = 12 = 102, 3 = 52 = 62, 4 = 22 = 92, 5 = 42 = 72 and 9 = 32 = 82. The quadratic non-residues modulo 11 are, therefore, 2, 6, 7, 8 and 10. We treat 0 neither as a quadratic residue nor as a quadratic non-residue.

Definition 2.30.

Let p be an odd prime and a an integer with gcd(a, p) = 1. The Legendre symbol is defined as:

Proposition 2.21.

Let p be an odd prime and a and b integers coprime to p.

  1. Euler’s criterion .

  2. .

  3. , and .

  4. If ab (mod p), then . In particular, if r is the remainder of Euclidean division of a by p, then .

Proof

If a is a quadratic residue modulo p, then ab2 (mod p) for some integer b (coprime to p) and by Fermat’s little theorem we have a(p–1)/2bp–1 ≡ 1 (mod p). Conversely, the polynomial Xp–1 – 1 = (X(p–1)/2 – 1)(X(p–1)/2 + 1) has p – 1 (distinct) roots mod p (again by Fermat’s little theorem). We have seen that no quadratic residues are roots of X(p–1)/2 + 1. Since is a field, the (p – 1)/2 roots of X(p–1)/2 – 1 are precisely all the quadratic residues modulo p. This proves Euler’s criterion. The other statements are immediate consequences of this.

Euler’s criterion gives us a nice way to check if a given integer is a quadratic residue modulo an odd prime. While this is much faster than the brute-force strategy of enumerating all the quadratic residues, it is still not the best solution, because it involves a modular exponentiation. We can, however, employ a gcd-like procedure for a faster computation. The development of this method demands further results which are otherwise interesting in themselves as well. The first important result is known as the law of quadratic reciprocity (Theorem 2.18 below). Gauss was the first to prove it and he deemed the result so important that he gave eight proofs for it. At present about two hundred published proofs of this law exist in the literature. We go in the classical way, that is, the Gaussian way, because the proof, though somewhat long, is elementary.

Lemma 2.4. Gauss

Let p be an odd prime and a an integer with gcd(a, p) = 1. Let us denote t := (p – 1)/2. For an integer i, let ri be the unique integer with riia (mod p) and –trit. Let n be the number of i, 1 ≤ it, for which ri is negative. Then .

Proof

It is easy to check that ri ≢ ±rj (mod p) for all ij with 1 ≤ i, jt. Thus |ri|, i = 1, . . . , t, are precisely (a permuted version of) the integers 1, . . . , t. Thus . Canceling t! and using Proposition 2.21(1) gives the desired result.

Definition 2.31.

Let . The largest integer smaller than or equal to x is called the floor of x and is denoted by ⌊x⌋. Similarly, the smallest integer larger than or equal to x is called the ceiling of x and is denoted by ⌈x⌉.

Corollary 2.7.

With the notations of Lemma 2.4 we have (mod 2). If a is odd, then (mod 2). In particular, , that is, 2 is a quadratic residue mod p if and only if p ≡ ±1 (mod 8).

Proof

Since is even , it follows that if rj > 0, then is even, and if rj < 0, then is odd. Therefore, (mod 2).

If a is odd, p + a is even. Also 4 is a quadratic residue modulo p. So , where (mod 2) and (mod 2). Putting a = 1 gives and, therefore, , that is, .

Theorem 2.18. Law of quadratic reciprocity

Let p and q be distinct odd primes. Then .

Proof

By Corollary 2.7, , where , , s = (q – 1)/2 and t = (p – 1)/2. So we are done, if we can show that m + n = st. Consider the set S := {(x, y) | 1 ≤ xs, 1 ≤ yt} of cardinality st. Now S is the disjoint union of S1 and S2, where and . (Note that we cannot have px = qy.) It is easy to see that #S1 = m and #S2 = n.

To demonstrate how we can use the results deduced so far, let us compute . Since 360 = 23 · 32 · 5, we have

Thus 360 is a quadratic residue modulo 997. The apparent attractiveness of this method is beset by the fact that it demands the factorization of several integers and as such does not lead to a practical algorithm. We indeed need further machinery in order to have an efficient algorithm. First, we define a generalization of the Legendre symbol.

Definition 2.32.

Let a, b be integers with b > 0 and odd. We define the Jacobi symbol as

where, in the last case, p1, . . . , pt are all the prime factors of b (not necessarily all distinct).

Note that if , then a is not a quadratic residue mod b. However, the converse is not always true, that is, does not necessarily imply that a is a quadratic residue modulo b (Example: a = 2 and b = 9). Of course, if b is an odd prime and if gcd(a, b) = 1, the Legendre and Jacobi symbols correspond to the same value and meaning.

The Jacobi symbol enjoys many properties similar to the Legendre symbol.

Proposition 2.22.

For integers a, a′ and positive odd integers b, b′, we have:

  1. ,

  2. , and

  3. if aa′ (mod b), then . In particular, if r is the remainder of Euclidean division of a by b, then .

Proof

Immediate from the definition and Proposition 2.21.

Theorem 2.19.
  1. For an odd positive integer b

  2. If a is another odd positive integer with gcd(a, b) = 1, then

Proof

  1. Let b = p1 · · · ps, where pi are odd primes (not necessarily distinct). Then by definition , where . Now for odd integers x and y one has (mod 2). Repeated applications of this prove that (mod 2). To prove that , we proceed in a similar manner and note that for odd integers x and y one has (mod 2).

  2. If with odd primes, then by definition

    where from Theorem 2.18 it follows that

Now, we can calculate without factoring as follows.

2.5.4. Some Assorted Topics

So far, we have studied some elementary properties of integers. Number theory is, however, one of the oldest and widest branches of mathematics. Various complex-analytic and algebraic tools have been employed to derive more complicated properties of integers. In Section 2.13, we give a short introductory exposition to algebraic number theory. Here, we mention a collection of useful results from analytic number theory. The proofs of these analytic results would lead us too far away and hence are omitted here. Inquisitive (and/or cynical) readers may consult textbooks on analytic number theory for the details missing here.

The prime number theorem

The famous prime number theorem gives an asymptotic estimate of the density of primes smaller than or equal to a positive real number. Gauss conjectured this result in 1791. Many mathematicians tried to prove it during the 19th century and came up with partial results. Riemann made reasonable progress towards proving the theorem, but could not furnish a complete proof before he died in 1866. It is interesting to mention here that a good portion of the theory of analytic functions (also called holomorphic functions) in complex analysis was developed during these attempts to prove the prime number theorem. The first complete proof of the theorem (based mostly on the ideas of Riemann and Chebyshev) was given independently by the French mathematician Hadamard and by the Belgian mathematician de la Vallée Poussin in 1896. Their proof is regarded as one of the major achievements of modern mathematics. People started believing that any proof of the prime number theorem has to be analytic. Erdös and Selberg destroyed this belief by independently providing the first elementary proof of the theorem in 1949. Here (and elsewhere in mathematics), the adjective elementary refers to something which does not depend on results from analysis or algebra. Caution: Elementary is not synonymous with easy !

Theorem 2.20. Prime Number Theorem

Let π(x) denote the number of primes less than or equal to a real number x > 0. As x → ∞ we have π(x) → x/ln x (that is, the ratio π(x)/(x/ln x) → 1). In particular, for the density π(n)/n of primes among the natural numbers ≤ n asymptotically approaches 1/ ln n as n → ∞. It also follows that the n-th prime is approximately equal to n ln n.

Though the prime number theorem provides an asymptotic estimate (that is, one for x → ∞), for finite values of x (for example, for the values of x in the cryptographic range) it does give good approximations for π(x). Table 2.1 lists π(x) against the rounded values of x/ ln x for x equal to small powers of 10.

Table 2.1. Approximations to π(x)
xπ(x)x/ ln xx/(ln x – 1)Li(x)
103168145169178
1041229108612181246
1059592868695129630
10678,49872,38278,03078,628
107664,579620,421661,458664,918
1085,761,4555,428,6815,740,3045,762,209

Given the prime number theorem it follows that π(x) approaches x/(ln x – ξ) for any real ξ. It turns out that ξ = 1 is the best choice. Gauss’ Li function is also an asymptotic estimate for π(x), where for real x > 0 one defines:

Gauss conjectured that Li(x) asymptotically equals π(x). The prime number theorem is, in fact, equivalent to this conjecture. Furthermore, de la Vallée Poussin proved that Li(x) is a better approximation to π(x) than x/(ln x – ξ) for any real ξ. Table 2.1 also lists x/(ln x – 1) and Li(x) against the actual values of π(x).

The asymptotic formula does not rule out the possibility that the error π(x)–(x/ ln x) tends to zero as x → ∞. It has been shown by Dusart [83] that (x/ ln x) – 0.992(x/ ln2 x) ≤ π(x) ≤ (x/ ln x) + 1.2762(x/ ln2 x) for all x > 598.

Density of smooth integers

Integers having only small prime divisors play an interesting role in cryptography and in number theory in general.

Definition 2.33.

Let . An integer x is called y-smooth (or simply smooth, if y is understood from the context), if all the prime divisors of x are ≤ y. We denote by ψ(x, y) the fraction of positive integers ≤ x, that are y-smooth.

The following theorem gives an asymptotic estimate for ψ(x, y).

Theorem 2.21.

Let x, with x > y and let u := ln x/ ln y. For u → ∞ and y ≥ ln2 x we have the asymptotic formula:

ψ(x, y) → uu+o(u) = e–[(1+o(1))u ln u].

In Theorem 2.21, the notation g(u) = o(f(u)) implies that the ratio g(u)/f(u) tends to 0 as u approaches ∞. See Definition 3.1 for more details. An interesting special case of the formula for ψ(x, y) will be used quite often in this book and is given as Corollary 4.1 in Chapter 4.

Like the prime number theorem, Theorem 2.21 gives only asymptotic estimates, but is indeed a good approximation for finite values of x, y and u (that is, for the values of practical interest). The most important implication of this theorem is that the density of y-smooth integers in the set {1, . . . , x} is a very sensitive function of u = ln x/ ln y and decreases very rapidly as x increases. For example, if y = 15,485,863, the millionth prime, then a random integer ≤ 2250 is y-smooth with probability approximately 2.12 × 10–11, whereas a random integer ≤ 2500 is y-smooth with probability approximately 2.23 × 10–28. (These figures are computed neglecting the o(u) term in the expression of ψ(x, y).) In other words, smaller integers have higher probability of being smooth (that is, y-smooth for a given y).

The extended Riemann hypothesis

The Riemann hypothesis (RH) is one of the deepest unsolved problems in mathematics. An extended version of this hypothesis has important bearings on the solvability of certain computational problems in polynomial time.

Definition 2.34.

The Euler zeta function ζ(s) is defined for a complex variable s with Re s ≥ 1 as

The reader may already be familiar with the results: ζ(1) = ∞, ζ(2) = π2/6 and ζ(4) = π4/90. Riemann (analytically) extended the Euler Zeta function for all complex values of s (except at s = 1, where the function has a simple pole). This extended function, called the Riemann zeta function, is known to have zeros at s = –2, –4, –6, . . . . These are called the trivial zeros of ζ(s). It can be proved that all non-trivial zeros of ζ(s) must lie in the so-called critical strip : 0 ≤ Re s ≤ 1, and are symmetric about the critical line: Re s = 1/2.

Conjecture 2.1. Riemann hypothesis (RH)

All non-trivial zeros of ζ(s) lie on the critical line.

In 1900, Hilbert asserted that proving or disproving the RH is one of the most important problems confronting 20th century mathematicians. The problem continues to remain so even to the 21st century mathematicians.

In 1901, von Koch proved that the RH is equivalent to the formula:

Conjecture 2.2. An equivalent form of the Riemann hypothesis

π(x) = Li(x) + O(x1/2 ln x)

Here the order notation f(x) = O(g(x)) means that |f(x)/g(x)| is less than a constant for all sufficiently large x (See Definition 3.1).

Hadamard and de la Vallée Poussin proved that

for some positive constant α. While this estimate was sufficient to prove the prime number theorem, the tighter bound of Conjecture 2.2 continues to remain unproved.

Theorem 2.22. Dirichlet’s theorem on primes in arithmetic progression

Let a, be coprime. The set contains an infinite number of primes.

Dirichlet’s theorem is a powerful generalization of Theorem 2.12 (which corresponds to a = b = 1). One can accordingly generalize the notation π(x) as follows:

Definition 2.35.

Let a, with gcd(a, b) = 1. By πa, b(x), we denote the number of primes in the set , that are ≤ x.

The prime number theorem gives the estimate:

where φ is Euler’s totient function. The RH now generalizes to:

Conjecture 2.3. Extended Riemann hypothesis (ERH)

For a, with gcd(a, b) = 1,

Some authors use the expression Generalized Riemann hypothesis (GRH) in place of ERH. Taking b = 1 demonstrates that the ERH implies the RH. The ERH also implies the following:

Conjecture 2.4.

The smallest positive quadratic non-residue modulo a prime p is < 2 ln2 p.

Exercise Set 2.5

2.35
  1. Show that any integer n ≥ 3 satisfies n2 = a2b2 for some a, .

  2. Show that for any integer n ≥ 2 the integer n4 + 4n is composite.

2.36Let and S a subset of {1, 2, ..., 2n} of cardinality n + 1. Show that: [H]
  1. There exist x, such that xy = 1.

  2. There exist x, such that xy = n.

  3. There exist distinct x, such that x is a multiple of y.

  4. There exist distinct x, such that x is relatively prime to y.

2.37Show that for any , n > 1, the rational number is not an integer. [H]
2.38
  1. Show that the Mersenne number Mn := 2n – 1 is prime only if n is prime.

  2. Show that the Fermat number 2n + 1 is prime only if n = 2t for some .

2.39Let n ≥ 2 be a natural number. A complete residue system modulo n is a set of n integers a1, . . . , an such that aiaj (mod n) for ij. Similarly, a reduced residue system modulo n is a set of φ(n) integers b1, . . . , bφ(n) such that gcd(bi, n) = 1 for all i = 1, . . . , φ(n) and bibj (mod n) for ij. Show that:
  1. If {a1, . . . , an} is a complete residue system modulo n, the equivalence classes of a1, . . . , an (modulo the ideal ) constitute the set . In other words, given any integer a, there exists a unique i, 1 ≤ in, for which aai (mod n).

  2. If {b1, . . . , bφ(n)} is a reduced residue system modulo n, then the equivalence classes of b1, . . . , bφ(n) constitute the set . In other words, given any integer b coprime to n, there exists a unique i, 1 ≤ i ≤ φ(n), for which bbi (mod n).

  3. If {a1, . . . , an} is a complete residue system modulo n, then for any integer a coprime to n, the integers aa1, . . . , aan constitute a complete residue system modulo n. For example, if n is odd, then {2, 4, 6, . . . , 2n} is a complete residue system modulo n.

  4. If {b1, . . . , bφ(n)} is a reduced residue system modulo n, then for any integer b coprime to n, the integers bb1, . . . , bbφ(n) constitute a reduced residue system modulo n.

  5. For n > 2, the integers 12, 22, . . . , n2 do not constitute a complete residue system modulo n. [H]

  6. If p is an odd prime and if {a1, . . . , ap} and are two complete residue systems modulo p, then is not a complete residue system modulo p. [H]

2.40Prove that the decimal expansion of any rational number a/b is recurring, that is, (eventually) periodic. (A terminating expansion may be viewed as one with recurring 0.) [H]
2.41Let p be an odd prime. Show that the congruence x2 ≡ –1 (mod p) is solvable if and only if p ≡ 1 (mod 4). [H]
2.42Let .
  1. Show that if n > 2, then φ(n) is even.

  2. Show that if n is odd, then φ(n) = φ(2n).

  3. Find out all the values of n for which φ(n) = 12.

2.43For , show that .
2.44Let n > 2 and gcd(a, n) = 1. Let h be the multiplicative order of a modulo n (that is, in the group ). Show that:
  1. aiaj (mod n) if and only if ij (mod h).

  2. The multiplicative order of al modulo n is h/ gcd(h, l).

  3. If a is a primitive element of (that is, if h = φ(n)), then 1, a, a2, . . . , ah–1 is a reduced residue system modulo n.

  4. If gcd(b, n) = 1 and b has multiplicative order k modulo n and if gcd(h, k) = 1, then the multiplicative order of ab modulo n is hk.

2.45Device a criterion for the solvability of ax2 + bx + c ≡ 0 (mod p), where p is an odd prime and gcd(a, p) = 1. [H]
2.46Let p be a prime and . An integer a with gcd(a, p) = 1 is called an r-th power residue modulo p, if the congruence xra (mod p) has a solution. Show that a is an r-th power residue modulo p if and only if a(p–1)/ gcd(r, p–1) ≡ 1 (mod p). This is a generalization of Euler’s criterion for quadratic residues.
2.47Let G be a finite cyclic group of cardinality n. Show that and that there are exactly φ(n) generators (that is, primitive elements) of G.
2.48Let m, with m|n. Show that the canonical (surjective) ring homomorphism induces a surjective group homomorphism of the respective groups of units. (Note that every ring homomorphism induces a group homomorphism , where A* and B* are the groups of units of A and B respectively. Even when is surjective, need not be surjective, in general. As an example consider the canonical surjection for a prime p > 3.)
2.49In this exercise, we investigate which of the groups is cyclic for a prime p and .
  1. Show that and are cyclic, but is not cyclic. Conclude that is not cyclic for e ≥ 3. [H] More specifically, show that for e ≥ 3 the multiplicative group is the direct product of two cyclic subgroups generated by –1 and 5 respectively.

  2. Show that if p is an odd prime and , then is cyclic. [H]

2.50Show that the multiplicative group , n ≥ 2, is cyclic if and only if n = 2, 4, pe, 2pe, where p is an odd prime and . [H]

2.6. Polynomials

Unless otherwise stated, in this section we denote by K an arbitrary field and by K[X] the ring of polynomials in one indeterminate X and with coefficients from K. Since K[X] is a PID, it enjoys many properties similar to those of . To start with, we take a look at these properties. Then we introduce the concept of algebraic elements and discuss how irreducible polynomials can be used to construct (algebraic) extensions of fields. When no confusions are likely, we denote a polynomial by f only.

2.6.1. Elementary Properties

Since K[X] is a PID and hence a UFD, every polynomial in K[X] can be written essentially uniquely as a product of prime polynomials. Conventionally prime polynomials are more commonly referred to as irreducible polynomials. Similar to the case of the ring K[X] contains an infinite number of irreducible elements, for if K is infinite, then is an infinite set of irreducible polynomials of K[X], and if K is finite, then as we will see later, there is an irreducible polynomial of degree d in K[X] for every .

It is important to note here that the concept of irreducibility of a polynomial is very much dependent on the field K. If KL is a field extension, then a polynomial in K[X] is naturally an element of L[X] also. A polynomial which is irreducible over K need not continue to remain so over L. For example, the polynomial x2 – 2 is irreducible over , but reducible over , since , being a real number but not a rational number. As a second example, the polynomial x2 + 1 is irreducible over both and but not over . In fact, we will show shortly that an irreducible polynomial in K[X] of degree > 1 becomes reducible over a suitable extension of K.

For polynomials f(X), with g(X) ≠ 0, there exist unique polynomials q(X) and r(X) in K[X] such that f(X) = q(X)g(X) + r(X) with r(X) = 0 or deg r(X) < deg g(X). The polynomials q(X) and r(X) are respectively called the quotient and remainder of polynomial division of f(X) by g(X) and can be obtained by the so-called long division procedure. We use the notations: q(X) = f(X) quot g(X) and r(X) = f(X) rem g(X).

Whenever we talk about the gcd of two non-zero polynomials, we usually refer to the monic gcd, that is, a polynomial with leading coefficient 1. This makes the gcd of two polynomials unique. We have gcd(f(X), g(X)) = gcd(g(X), r(X)), where r(X) = f(X) rem g(X). This gives rise to an algorithm (similar to the Euclidean gcd algorithm for integers) for computing the gcd of two polynomials. Bézout relations also hold for polynomials. More specifically:

Proposition 2.23.

Let f(X), , not both zero, and d(X) the (monic) gcd of f(X) and g(X). Then there are polynomials u(X), such that d(X) = u(X)f(X) + v(X)g(X). (Such an equality is called a Bézout relation.) Furthermore, if f(X) and g(X) are non-zero and not both constant, then u(X) and v(X) can be so chosen that deg u(X) < deg g(X) and deg v(X) < deg f(X).[6]

[6] Recall that the degree of the zero polynomial is taken to be –∞.

Proof

Similar to the proof of Proposition 2.16.

The concept of congruence can be extended to polynomials, namely, if , then two polynomials g(X), are said to be congruent modulo f(X), denoted g(X) ≡ h(X) (mod f(X)), if f(X)|(g(X) – h(X)), that is, if there exists with g(X) – h(X) = u(X)f(X), or equivalently, if g(X) rem f(X) = h(X) rem f(X).

The principal ideals 〈f(X)〉 of K[X] play an important role (as do the ideals 〈n〉 of ). Let us investigate the structure of the quotient ring R := K[X]/〈f(X)〉 for a non-constant polynomial . If r(X) denotes the remainder of division of by f(X), then it is clear that the residue classes of g(X) and r(X) are the same in R. On the other hand, two polynomials g(X), with deg g(X) < deg f(X) and deg h(X) < deg f(X) represent the same residue class in R if and only if g(X) = h(X). Thus elements of R are uniquely representable as polynomials of degrees < deg f(X). In other words, we may represent the ring R as the set together with addition and multiplication modulo the polynomial f(X). The ring R contains all the constant polynomials , that is, the field K is canonically embedded in R. In general, R is not a field. The next theorem gives the criterion for R to be a field.

Theorem 2.23.

For a non-constant polynomial , the ring K[X]/〈f(X)〉 is a field if and only if f(X) is irreducible in K[X].

Proof

If f(X) is reducible over K, then we can write f(X) = g(X)h(X) for some polynomials g(X), with 1 ≤ deg g < deg f and 1 ≤ deg h < deg f. Then both g and h represent non-zero elements in K[X]/〈f(X)〉, whose product is 0, that is, K[X]/〈f(X)〉 has non-zero zero divisors.

Conversely, if f(X) is irreducible over K and if g(X) is a non-zero polynomial of degree < deg f(X), then gcd(f(X), g(X)) = 1, so that by Proposition 2.23 there exist polynomials u(X), with u(X)f(X) + v(X)g(X) = 1 and deg v(X) < deg f(X). Thus we see that v(X)g(X) ≡ 1 (mod f(X)), that is, g(X) has a multiplicative inverse modulo f(X).

Let L := K[X]/〈f(X)〉 with f(X) irreducible over K. Then KL is a field extension. If deg f(X) = 1, then L is isomorphic to K. If deg f(X) ≥ 2, then L is a proper extension of K. This gives us a useful and important way of representing the extension field L, given a representation for K. (For example, see Section 2.9.)

2.6.2. Roots of Polynomials

The study of the roots of a polynomial is the central objective in algebra. We now derive some elementary properties of roots of polynomials.

Definition 2.36.

Let . An element is said to be a root of f, if f(a) = 0.

Proposition 2.24.

Let and . Then f(X) = (Xa)q(X) + f(a) for some . In particular, a is a root of f(X) if and only if Xa divides f(X).

Proof

Polynomial division of f(X) by Xa gives f(X) = (Xa)q(X) + r(X) with deg r(X) < deg(Xa) = 1. Thus r(X) is a constant polynomial. Let us denote r(X) by . Substituting X = a gives f(a) = r.

Proposition 2.25.

A non-zero polynomial with d := deg f can have at most d roots in K.

Proof

We proceed by induction on d. The result clearly holds for d = 0. So assume that d ≥ 1 and that the result holds for all polynomials of degree d – 1. If f has no roots in K, we are done. So assume that f has a root, say, . By Proposition 2.24, we have f(X) = (Xa)g(X) for some . Clearly, deg g = d – 1 and so by the induction hypothesis g has at most d – 1 roots. Since K is a field (and hence does not contain non-zero zero divisors), it follows that the roots of f are precisely a and the roots of g. This establishes the induction step.

In the last proof, the only result we have used to exploit the fact that K is a field is that K contains no non-zero zero divisors. This is, however, true for every integral domain. Thus Proposition 2.25 continues to hold if K is any integral domain (not necessarily a field). However, if K is not an integral domain, the proposition is not necessarily true. For example, if ab = 0 with a, , ab, then the polynomial X2 + (ba)X has at least three roots: 0, a and ab.

For a field extension KL and for a polynomial , we may think of the roots of f in L, since too. Clearly, all the roots of f in K are also roots of f in L. However, the converse is not true in general. For example, the only roots of X4 – 1 in are ±1, whereas the roots of the same polynomial in are ±1, ±i. Indeed we have the following important result.

Proposition 2.26.

For any non-constant polynomial , there exists a field extension K′ of K such that f has a root in K′.

Proof

If f has a root in K, taking K′ = K proves the proposition. So we assume that f has no root in K (which implies that deg f ≥ 2). In principle, we do not require f to be irreducible. But if we consider a non-constant factor g of f, irreducible over K, we see that the roots of g in any extension L of K are roots of f in L too. Thus we may replace f by g and assume, without loss of generality, that f is irreducible. We construct the field extension K′ := K[X]/〈f〉 of K and denote the equivalence class of X in K′ by α. (One also writes x, X or [X] to denote this equivalence class.) It is clear that , that is, α is a root of f(X) in K′.

We say that the field K′ in the proof of the last proposition is obtained by adjoining the root α of f and denote this as K′ = K(α). We can write f(X) = (X – α)f1(X), where and deg f1 = (deg f) – 1. Now there is a field extension K″ of K′, where f1 has a root. Proceeding in this way we prove the following result.

Proposition 2.27.

A non-constant polynomial f in K[X] with deg f = d has d roots (not necessarily all distinct) in some field extension L of K.

If a polynomial of degree d ≥ 1 has all its roots α1, . . . , αd in L, then f(X) = a(X – α1) · · · (X – αd) for some (actually ). In this case, we say that f splits (completely or into linear factors) over L.

Definition 2.37.

Let be a non-constant polynomial. A minimal (with respect to inclusion) field extension of K, over which f splits completely is called a splitting field of f over K.[7] This is a minimal field which contains K and all the roots of f.

[7] It is necessary to use the phrase “over K” in this definition. X2 + 1, treated as a polynomial in , has the splitting field , whereas the same polynomial, treated as an element of , has the splitting field (see Equation (2.3) on p 74).

Every non-constant polynomial has a splitting field L over K. Quite importantly, this field L is unique in some sense. This allows us to call the splitting field of f instead of a splitting field of f. We discuss these topics further in Section 2.8.

Definition 2.38.

Let f be a non-constant polynomial in K[X] and let α be a root of f (in some extension of K). The largest natural number n for which (X –α)n|f(X) is called the multiplicity of the root α (in f). If n = 1 (resp. n > 1), then α is called a simple (resp. multiple) root of f. If all the roots of f are simple, then we call f a square-free polynomial. It is easy to see that f is square-free, only if f is not divisible by the square of a non-constant polynomial in K[X]. The reverse implication also holds, if char K = 0 or if K is a finite field (or, more generally, if K is a perfect field—see Exercise 2.76).

The notion of multiplicity can be extended to a non-root β of f by setting the multiplicity of β to zero.

2.6.3. Algebraic Elements and Extensions

Here we assume, unless otherwise stated, that KL is a field extension.

Definition 2.39.

An element is said to be algebraic over K, if there exists a non-constant polynomial with f(α) = 0. If an element is not algebraic over K, we say that α is transcendental over K. Thus a transcendental (over K) element is a root of no polynomials in K[X]. A field extension KL is called an algebraic extension, if every element of L is algebraic over K. A non-algebraic extension is also called a transcendental extension. If KL is a transcendental extension, there exists at least one element , which is transcendental (that is, not algebraic) over K.

Example 2.10.
  1. Every element is algebraic over K, since it is a root of the non-constant polynomial .

  2. The element is algebraic over , since α is a root of the polynomial .

  3. The well-known real numbers e and π are transcendental over . (We are not going to prove this.) Of course, the concept of algebraic and transcendental elements is heavily dependent on the field K. For example, e and π, being elements of , are algebraic over .

  4. A complex number , where and a, , is a root of the polynomial and hence is algebraic over . Therefore, the field extension is algebraic.

  5. The extension is transcendental, since contains elements (like e and π) that are transcendental over .

Definition 2.40.

Let be algebraic over K. A non-constant polynomial of least positive degree with f(α) = 0 is called a minimal polynomial of α over K.

Proposition 2.28.

Let be algebraic over K. A minimal polynomial f of α over K is irreducible over K. If is a polynomial with h(α) = 0, then f|h. In particular, any two minimal polynomials f and g of α satisfy g(X) = cf(X) for some .

Proof

Let f = f1f2 for some non-constant polynomials f1, . Since K is a field and 0 = f(α) = f1(α)f2(α), we have f1(α) = 0 or f2(α) = 0. But deg f1 < deg f and deg f2 < deg f, a contradiction to the choice of f.

Using polynomial division one can write h(X) = q(X)f(X) + r(X) for some polynomials q, . Now h(α) = 0 implies r(α) = 0. Since deg r < deg f, by the choice of f we must then have r(X) = 0, that is, f|h.

Finally, if f and g are two minimal polynomials of α over K, then f|g and g|f and it follows that g(X) = cf(X) for some unit c of K[X]. But the only units of K[X] are the non-zero elements of K.

By Proposition 2.28, a monic minimal polynomial f of α over K is uniquely determined by α and K. It is, therefore, customary to define the minimal polynomial of α over K to be this (unique) monic polynomial. Unless otherwise stated, we will stick to this revised definition and write f(X) = minpolyα, K(X).

Example 2.11.
  1. For , we have minpolyα, K(X) = X – α.

  2. A complex number z = a+ib, a, , b ≠ 0, is not a root of a linear polynomial over , but is a root of the quadratic polynomial . Therefore, , that is, f is irreducible over .

Proposition 2.29.

For a field K, the following conditions are equivalent.

  1. Every proper field extension KL is transcendental (that is, K has no algebraic extensions other than itself).

  2. Every non-constant polynomial in K[X] has a root in K.

  3. Every non-constant polynomial in K[X] splits in K.

  4. Every non-constant irreducible polynomial in K[X] is of degree 1.

Proof

[(a)⇒(b)] Consider a non-constant irreducible polynomial and the field extension L = K[X]/〈f〉 of K. We have seen that L contains a root of f. We will prove in Section 2.8 that such an extension is algebraic (Corollary 2.11). Hence (a) implies that L = K, that is, K contains a root of f.

[(b)⇒(c)] Let be a non-constant polynomial. By (b), f has a root, say, . Thus f(X) = (X – α1)f1(X) for some with deg f1 = (deg f) – 1. If f1 is a constant polynomial, we are done. Otherwise, we find as above and with f1(X) = (X – α2)f2(X) and with deg f2 = (deg f) – 2. Proceeding in this way proves (c).

[(c)⇒(d)] Obvious.

[(d)⇒(a)] Let be algebraic over K and let . Since f is irreducible, by (d) deg f = 1, that is, f(X) = X – α, that is, .

Definition 2.41.

A field K satisfying the equivalent conditions of Proposition 2.29 is called an algebraically closed field. For an arbitrary field K, a minimal algebraically closed field containing K is called an algebraic closure of K.

We will see in Section 2.8 that an algebraic closure of every field exists and is unique in some sense. The algebraic closure of an algebraically closed field K is K itself. We end this section with the following well-known theorem. We will not prove the theorem in this book, because every known proof of it uses some kind of complex analysis which this book does not deal with.

Theorem 2.24. Fundamental theorem of algebra

The field is algebraically closed.

is not algebraically closed, since the proper extension is algebraic (See Example 2.10). Indeed, is the algebraic closure of .

Exercise Set 2.6

2.51Let R be a ring and f, . Show that:
  1. deg(f + g) ≤ max(deg f, deg g) with equality holding, if deg f ≠ deg g.

  2. deg(f g) ≤ deg f + deg g with equality holding, if R is an integral domain.

  3. If R is an integral domain, then R[X] is an integral domain too. More generally, if R is an integral domain, then R[X1, . . . , Xn] is also an integral domain for all .

2.52Let f, , where R is an integral domain. Show that if f(ai) = g(ai) for i = 1, . . . , n, where n > max(deg f, deg g) and where a1, . . . , an are distinct elements of R, then f = g. In particular, if f(a) = g(a) for an infinite number of , then f = g.
2.53

Lagrange’s interpolation formula Let K be a field and let a0, . . . , an be distinct elements of K. Show that for (not necessarily all distinct), there exists a unique polynomial of degree ≤ n such that f(ai) = bi for all i = 0, . . . , n. [H]

2.54

Polynomials over a UFD Let R be a UFD. For a non-zero polynomial , a gcd of the coefficients of f is called a content of f and is denoted by cont f. One can then write f = (cont f)f1, where with cont . f1 is called a primitive part of f and is often denoted as pp f. It is clear that cont f and pp f are unique up to multiplication by units of R. If for a non-zero polynomial the content cont (or, equivalently, if f and pp f are associates), then f is called a primitive polynomial. Show that for two non-zero polynomials f, the elements cont(f g) and (cont f)(cont g) are associates in R. In particular, the product of two primitive polynomials is again primitive.

2.55Let R be a UFD. Show that a non-constant polynomial is irreducible over R if and only if f is irreducible over Q(R), where Q(R) denotes the quotient field of R (see Exercise 2.34).
2.56
  1. Eisenstein’s criterion Let R be a UFD and with an ≠ 0. Suppose that there is a prime such that p does not divide an, p divides ai for all i, 0 ≤ in – 1, and p2 does not divide a0. Show that f is irreducible over R.

  2. As an application of Eisenstein’s criterion show that for a prime the polynomial Xp–1 + · · · + X + 1 is irreducible in . [H]

2.57Let KL be a field extension and f1, . . . , fn non-constant polynomials in K[X]. Show that each fi, i = 1, . . . , n, splits over L if and only if the product f1 · · · fn splits over L.
2.58Show that the irreducible polynomials in have degrees ≤ 2. [H]
2.59Show that a finite field (that is, a field with finite cardinality) is not algebraically closed. In particular, the algebraic closure of a finite field is infinite.
2.60A complex number z is called an algebraic number, if z is algebraic over . An algebraic number z is called an algebraic integer, if z is a root of a monic polynomial in . Show that:
  1. If z is an algebraic number, then mz is an algebraic integer for some .

  2. If is an algebraic integer, then .

  3. If is an algebraic integer, then for any integer the complex numbers nz and z + n are algebraic integers.

2.61Let K be a field and . The formal derivative f′ of f is defined to be the polynomial . Show that:
  1. (f + g)′ = f′ + g′ and (f g)′ = fg + f g′ for any f, .

  2. If char K = 0, then f′ = 0 if and only if .

  3. If char K = p > 0, then f′ = 0 if and only if f(X) = g(Xp) for some .

  4. f (≠ 0) has no multiple roots (in any extension field of K), that is, f is square-free, if and only if gcd(f, f′) = 1.

  5. Let f be a (non-constant) irreducible polynomial over K. Show that if char K = 0, then f has no multiple roots. On the other hand, if char K = p > 0, show that f has multiple roots if and only if f(X) = g(Xp) for some . (However, if , then by Fermat’s little theorem g(Xp) = g(X)p, which contradicts the fact that f(x) is irreducible. Therefore, f cannot have multiple roots.)

2.62Let be a non-constant polynomial of degree d and let α1, . . . , αd be the roots of f (in some extension field of K). The quantity is called the discriminant of f. Prove the following assertions:
  1. Δ(f) = 0 if and only if f has a multiple root.

  2. .

  3. Δ(X2 + aX + b) = a2 – 4b.

  4. Δ(X3 + aX + b) = –(4a3 + 27b2).

2.7. Vector Spaces and Modules

Vector spaces and linear transformations between them are the central objects of study in linear algebra. In this section, we investigate the basic properties of vector spaces. We also generalize the concept of vector spaces to get another useful class of objects called modules. A module which also carries a (compatible) ring structure is referred to as an algebra. Study of algebras over fields (or more generally over rings) is of importance in commutative algebra, algebraic geometry and algebraic number theory.

2.7.1. Vector Spaces

Unless otherwise specified, K denotes a field in this section.

Definition 2.42.

A vector space V over a field K (or a K-vector space, in short) is an (additively written) Abelian group V together with a multiplication map · : K × VV called the scalar multiplication map, such that the following properties are satisfied by every a, and x, .

  1. a · (x + y) = a · x + a · y,

  2. (a + b) · x = a · x + b · x,

  3. 1 · x = x,

  4. a · (b · x) = (ab) · x,

where ab denotes the product of a and b in the field K. When no confusions are likely, we omit the scalar multiplication sign · and write a · x simply as ax.

Example 2.12.
  1. Any field K is trivially a K-vector space with the scalar multiplication being the same as the field multiplication. More generally, if KL is a field extension, then L is a K-vector space.

  2. For , the product Kn = K × · · · × K (n factors) is a K-vector space under the scalar multiplication map a(x1, . . . , xn) := (ax1, . . . , axn). For arbitrary K-vector spaces V1, . . . , Vn, we can analogously define the product V1 × · · · × Vn.

  3. The polynomial ring K[X] (or K[X1, . . . , Xn]) is a K-vector space (with the natural scalar multiplication).

Corollary 2.8.

Let V be a K-vector space. For every and , we have:

  1. 0 · x = 0.

  2. a · 0 = 0.

  3. (–a) · x = a · (–x) = –(a · x).

Proof

Easy verification.

Definition 2.43.

Let V be a vector space over K and S a subset of V. We say that S is a generating set or a set of generators of V (over K), or that S generates V (over K), if every element can be written as a finite linear combination x = a1x1 + · · · + anxn for some (depending on x) and with and for 1 ≤ in. A generating set S of V is called minimal, if no proper subset of S generates V. If V has a finite generating set, then V is called finitely generated or finite-dimensional.

Example 2.13.
  1. Consider the field extension L := K[X]/〈f(X)〉 of K, where f is an irreducible polynomial in K[X] of degree n. If α denotes the equivalence class of X in L, then every element of L can be written as an–1αn–1 + · · · + a1α + a0 with for 0 ≤ in – 1. Thus {1, α, . . . , αn–1} is a generating set of L over K. In particular, L is finitely generated over K.

  2. The K-vector space Kn is generated by the unit vectors ei, 1 ≤ in, defined as ei := (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position). Thus Kn is also finitely generated over K.

  3. {1, X, X2, · · ·} is an infinite generating set of the polynomial ring K[X] regarded as a K-vector space. K[X] is not finitely generated over K.

    It is not difficult to show that the generating sets discussed in these examples are minimal.

Definition 2.44.

A subset S of a K-vector space V is called linearly independent (over K), if whenever a1x1 + · · · + anxn = 0 for some , and , 1 ≤ in, we have a1 = · · · = an = 0. If S is not linearly independent, it is called linearly dependent. If S is linearly independent (resp. dependent), then we also say that the elements of S are linearly independent (resp. dependent). A maximal linearly independent subset of V is a linearly independent subset SV with the property that S ∪ {x} is linearly dependent for any .

If , then S is linearly dependent, since a · 0 = 0 for any . One can easily check that all the generating sets of Example 2.13 are linearly independent too. This is, however, not a mere coincidence, as the following result demonstrates.

Theorem 2.25.

A subset S of a K-vector space V is a minimal generating set for V if and only if S is a maximal linearly independent set of V.

Proof

[if] Given a maximal linearly independent subset S of V, we first show that S is a generating set for V. Take any non-zero . By the maximality of S, the set S ∪ {x} is linearly dependent, that is, there exists a linear relation of the form a0x + a1x1 + · · · + anxn = 0, , , with some ai ≠ 0. The linear independence of S forces a0 ≠ 0 and so is a finite linear combination of elements of S. Thus S generates V. Now, we show that S is minimal. Assume otherwise, that is, S′ := S \ {y} generates V for some . Since S is linearly independent, y ≠ 0. For some , , , we then have y = b1y1 + · · · + bmym, a contradiction to the linear independence of S.

[only if] Given a minimal generating set S of V, we first show that S is linearly independent. Assume not, that is, a1x1 + · · · + anxn = 0 for some , and with some ai, say a1, non-zero. But then and, therefore, S \ {x1} also generates V, a contradiction to the minimality of S. Thus S is linearly independent. Now choose a non-zero . Since S generates V, we can write y = b1y1 + · · · + bmym, , and , that is, 1yb1y1 – · · · – bmym = 0, that is, S ∪ {y} is linearly dependent.

Definition 2.45.

Let V be a K-vector space. A minimal generating set S of V is called a basis of V over K (or a K-basis of V). By Theorem 2.25, S is a basis of V if and only if S is a maximal linearly independent subset of V. Equivalently, S is a basis of V if and only if S is a generating set of V and is linearly independent.

Any element of a vector space can be written uniquely as a finite linear combination of elements of a basis, since two different ways of writing the same element contradict the linear independence of the basis elements.

A K-vector space V may have many K-bases. For example, the elements 1, aX + b, (aX + b)2, · · · form a K-basis of K[X] for any a, , a ≠ 0. However, what is unique in any basis of a given K-vector space V is the cardinality[8] of the basis, as shown in Theorem 2.26.

[8] Two sets (finite or not) S1 and S2 are said to be of the same cardinality, if there exists a bijective map S1S2.

For the sake of simplicity, we sometimes assume that V is a finitely generated K-vector space. This assumption simplifies certain proofs greatly. But it is important to highlight here that, unless otherwise stated, all the results continue to remain valid without the assumption. For example, it is a fact that every vector space has a basis. For finitely generated vector spaces, this is a trivial statement to prove, whereas without our assumption we need to use arguments that are not so simple. (A possible proof follows from Exercise 2.63 with U = {0}.)

Theorem 2.26.

Let V be a K-vector space. Then any K-basis of V has the same cardinality.

Proof

We assume that V is finitely generated. Let S = {x1, . . . , xn} be a minimal finite generating set, that is, a basis, of V. Let T be another basis of V. Assume that m := #T > n. (We might even have m = ∞.) We can choose distinct elements . Note that xi and yj are non-zero. Now we can write y1 = a1x1 + · · · + anxn for some (unique) , with some ai ≠ 0. Renumbering x1, . . . , xn, if necessary, we may assume that a1 ≠ 0. Then . It follows that y1, x2, . . . , xn generate V. In particular, we can write y2 = b1y1 + b2x2 + · · · + bnxn, , with some bi ≠ 0. If b2 = · · · = bn = 0, then y1, y2 are linearly dependent, a contradiction. So bi ≠ 0 for some i, 2 ≤ in. Again we may renumber x2, . . . , xn, if necessary, to assume that b2 ≠ 0. Then , that is, y1, y2, x3, . . . , xn generate V. Proceeding in this way we can show that y1, . . . , yn generate V, a contradiction to the minimality of T as a generating set. Thus we must have mn. In particular, m is finite. Now reversing the roles of S and T we can likewise prove that nm.

Theorem 2.26 holds even when V is not finitely generated. We omit the proof for this case here.

Definition 2.46.

Let V be a K-vector space. The cardinality of any K-basis of V is called the dimension of V over K and is denoted by dimK V (or by dim V, if K is understood from the context). We call V finite-dimensional (resp. infinite-dimensional), if dimK V is finite (resp. infinite).

For example, dimK Kn = n, , and dimK K[X] = ∞.

Definition 2.47.

Let V be a K-vector space. A subgroup U of V, which is closed under the scalar multiplication of V, is again a K-vector space and is called a (vector) subspace of V. In this case, we have dimK U ≤ dimK V (Exercise 2.63).

Example 2.14.

Let V be a vector space over K.

  1. The subset {0} and V are trivially subspaces of V.

  2. Let S be any subset of V (not necessarily linearly independent). Then the set is a vector subspace of V. We say that U is spanned or generated by S, or that S generates or spans U, or that U is the span of S. This is often denoted by or by U = Span S. If S is linearly independent, then S is a basis of U.

Definition 2.48.

Let V and W be K-vector spaces. A map f : VW is called a homomorphism (of vector spaces) or a linear transformation or a linear map over K, if

f(ax + by) = af(x) + bf(y)

for all a, and x, . Equivalently, f is a linear map over K if and only if f(x + y) = f(x) + f(y) and f(ax) = af(x) for all and x, . The set of all K-linear maps VW is denoted by HomK(V, W). HomK(V, W) is a K-vector space under the definitions (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all f, and . A K-linear transformation VV is called a K-endomorphism of V. The set of all K-endomorphisms of V is denoted by EndK V. A bijective[9] homomorphism (resp. endomorphism) is called an isomorphism (resp. automorphism).

[9] As in Footnote 2, we continue to be lucky here: The inverse of a bijective linear transformation is again a linear transformation.

Theorem 2.27.

Let V and W be K-vector spaces. Then V and W are isomorphic if and only if dimK V = dimK W.

Proof

If dimK V = dimK W and S and T are bases of V and W respectively, then there exists a bijection f : ST. One can extend f to a linear map as , for , and . One can readily verify that is an isomorphism. Conversely, if g : VW is an isomorphism and S is any basis of V, then is clearly a basis of W.

Corollary 2.9.

A K-vector space V with n := dimK V < ∞ is isomorphic to Kn.

Let V be a K-vector space and U a subspace. As in Section 2.3 we construct the quotient group V/U. This group can be given a K-vector space structure under the scalar multiplication map a(x + U) := ax + U, , . If TV is such that the residue classes of the elements of T form a K-basis of V/U and if S is a K-basis of U, then it is easy to see that ST is a K-basis of V. In particular,

Equation 2.2


For , the set is called the kernel Ker f of f, and the set is called the image Im f of f. We have the isomorphism theorem for vector spaces:

Theorem 2.28. Isomorphism theorem

Ker f is a subspace of V, Im f is a subspace of W, and V/Ker f ≅ Im f.

Proof

Similar to Theorem 2.3 and Theorem 2.9.

Definition 2.49.

For , the dimension of Im f is called the rank of f and is denoted by Rank f, whereas the dimension of Ker f is called the nullity of f and is denoted by Null f. An immediate consequence of the isomorphism theorem and of Equation (2.2) is the following important result.

Theorem 2.29.

Rank f + Null f = dimK V for any .

*2.7.2. Modules

If we remove the restriction that K is a field and assume that K is any ring, then a vector space over K is called a K-module. More specifically, we have:

Definition 2.50.

Let R be a ring. A module over R (or an R-module) is an (additively written) Abelian group M together with a multiplication map · : R × MM called the scalar multiplication map, such that for every a, and x, we have a · (x + y) = a · x + a · y, (a + b) · x = a · x + b · x, 1 · x = x, and a · (b · x) = (ab) · x, where ab denotes the product of a and b in the ring R. When no confusions are likely, we omit the scalar multiplication sign · and write a · x as ax.

Example 2.15.
  1. Vector spaces are special cases of modules, when the underlying ring is a field.

  2. Ideals of R are modules over R with the ring multiplication map taken as the scalar multiplication.

  3. Every Abelian group G is a -module under the scalar multiplication

  4. The polynomial rings R[X] and R[X1, . . . , Xn] are modules over R.

  5. Let Mi, , be a family of R-modules. The direct product of Mi is defined as the set of all tuples indexed by I. The direct sum is the subset of the Cartesian product consisting only of the tuples for which ai = 0 except for a finite number of . Both the direct product and the direct sum are R-modules under component-wise addition and scalar multiplication. When I is finite, they are naturally the same.

Modules are a powerful generalization of vector spaces. Any result we prove for modules is equally valid for vector spaces, ideals and Abelian groups. On the other hand, since we do not demand that the ring R be necessarily a field, certain results for vector spaces are not applicable for all modules.

It is easy to see that Corollary 2.8 continues to hold for modules. An R-submodule of an R-module M is a subgroup of M, that is closed under the scalar multiplication of M. For a subset SM, the set of all finite linear combinations of the form a1x1 + · · · + anxn, , , , is an R-submodule N of M, denoted by RS or . We say that N is generated by S (or by the elements of S). If S is finite, then N is said to be finitely generated. A (sub)module generated by a single element is called cyclic. It is important to note that unlike vector spaces the cardinality of a minimal generating set of a module is not necessarily unique. (See Exercise 2.68 for an example.) It is also true that given a minimal generating set S of M, there may be more than one ways of writing an element of M as finite linear combinations of elements of S. For example, if and S = {2, 3}, then 1 = (–1)·2+1·3 = 2·2+(–1)·3. The nice theory of dimensions developed in connection with vector spaces does not apply to modules.

For an R-submodule N of M, the Abelian group M/N is given an R-module structure by the scalar multiplication map a(x + N) := ax + N. This module is called the quotient module of M by N.

For R-modules M and N, an R-linear map or an R-module homomorphism (from M to N) is defined as a map f : MN with f(ax+by) = af(x)+bf (y) for all a, and x, (or equivalently with f(x + y) = f(x) + f(y) and f(ax) = af(x) for all and x, ). An isomorphism, an endomorphism and an automorphism are defined in analogous ways as in case of vector spaces. The set of all (R-module) homomorphisms MN is denoted by HomR(M, N) and the set of all (R-module) endomorphisms of M is denoted by EndR M. These sets are again R-modules under the definitions: (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all and (and f, g in HomR(M, N) or EndR M).

The kernel and image of an R-linear map f : MN are defined as the sets Ker and Im . With these notations we have the isomorphism theorem for modules:

Theorem 2.30. Isomorphism theorem

Ker f and Im f are submodules of M and N respectively and M / Ker f ≅ Im f.

For an R-module M and an ideal of R, the set consisting of all finite linear combinations with , and is a submodule of M. On the other hand, for a submodule N of M the set is an ideal of R. In particular, the ideal (M : 0) is called the annihilator of M and is denoted as AnnR M (or as Ann M). For any ideal , one can view M as an under the map . One can easily check that this map is well-defined, that is, the product is independent of the choice of the representative a of the equivalence class .

Definition 2.51.

A free module M over a ring R is defined to be a direct sum of R-modules Mi with each MiR as an R-module. If I is of finite cardinality n, then M is isomorphic to Rn.

Any vector space is a free module (Theorem 2.27 and Corollary 2.9). The Abelian groups , , are not free.

Theorem 2.31. Structure theorem for finitely generated modules

M is a finitely generated R-module if and only if M is a quotient of a free module Rn for some .

Proof

[if] The free module Rn has a canonical generating set ei, , where

ei = (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position).

If M = Rn/N, then the equivalence classes ei + N, i = 1, ..., n, constitute a finite set of generators of M.

[only if] If x1, ..., xn generate M, then the R-linear map f : RnM defined by (a1, ..., an) ↦ a1x1 + · · · + anxn is surjective. Hence by the isomorphism theorem MRn / Ker f.

**2.7.3. Algebras

Let be a homomorphism of rings. The ring A can be given an R-module structure with the multiplication map for and . This R-module structure of A is compatible with the ring structure of A in the sense that for every a, and x, one has (ax)(by) = (ab)(xy).

Conversely, if a ring A has an R-module structure with (ax)(by) = (ab)(xy) for every a, and x, , then there is a unique ring homomorphism taking aa · 1 (where 1 denotes the identity of A). This motivates us to define the following.

Definition 2.52.

Let R be a ring. An algebra over R or an R-algebra is a ring A together with a ring homomorphism . The homomorphism is called the structure homomorphism of the R-algebra A. If A and B are R-algebras with structure homomorphisms and ψ : RB, then an R-algebra homomorphism (from A to B) is a ring homomorphism η : AB such that .

Example 2.16.

Let R be a ring.

  1. The polynomial ring R[X1, . . . , Xn] is an R-algebra with the canonical inclusion as the structure homomorphism and is called a polynomial algebra over R.

  2. For an ideal of R, the canonical surjection makes an R-algebra.

  3. If A is an R-algebra with structure homomorphism and if B is an A-algebra with structure homomorphism ψ : AB, then B is an R-algebra with structure homomorphism .

  4. Combining (2) and (3) implies that if A is an R-algebra and an ideal of A, then the ring is again an R-algebra, called the quotient algebra of A by .

An R-algebra A is an R-module with the added property that multiplication of elements of A is now legal. Exploiting this new feature leads to the following concept of algebra generators.

Definition 2.53.

Let A be an R-algebra with the structure homomorphism . A subset S of A is said to generate A as an R-algebra, if every element can be written as a polynomial expression in (finitely many) elements of S with coefficients from R (that is, from ). We write this as A = R[S]. If S = {x1, . . . , xn} is finite, we also write R[x1, . . . , xn] in place of R[S] and say that A is finitely generated as an R-algebra or that the homomorphism is of finite type.

Example 2.17.
  1. The polynomial algebra R[X1, . . . , Xn], n ≥ 1, over R is not finitely generated as an R-module, but is finitely generated as an R-algebra.

  2. For an ideal of R[X1, . . . , Xn], the ring is generated as an R-algebra by the equivalence classes of Xi, 1 ≤ in, that is, A = R[x1, . . . , xn]. If is not the zero ideal, then A is not a polynomial algebra, because x1, . . . , xn are not indeterminates in the sense that they satisfy (non-zero) polynomial equations f(x1, . . . , xn) = 0 for every . (In this case, we also say that x1, . . . , xn are algebraically dependent.) The notation R[. . .] is a generalization of the notation for polynomial algebras. In what follows, we usually denote polynomial algebras by R[X1, . . . , Xn] with upper-case algebra generators, whereas for an arbitrary finitely generated R-algebra we use lower-case symbols for the algebra generators as in R[x1, . . . , xn].

One may proceed to define kernels and images of R-algebra homomorphisms and frame and prove the isomorphism theorem for R-algebras. We leave the details to the reader. We only note that algebra homomorphisms are essentially ring homomorphisms with the added condition of commutativity with the structure homomorphisms.

Theorem 2.32.

A ring A is a finitely generated R-algebra if and only if A is a quotient of a polynomial algebra (over R).

Proof

[if] Immediate from Example 2.17.

[only if] Let A := R[x1, . . . , xn]. The map η : R[X1, . . . , Xn] → A that takes f(X1, . . . , Xn) ↦ f(x1, . . . , xn) is a surjective R-algebra homomorphism. By the isomorphism theorem, one has the isomorphism AR[X1, . . . , Xn]/Ker η of R-algebras.

This theorem suggests that for the study of finitely generated algebras it suffices to investigate only the polynomial algebras and their quotients.

Exercise Set 2.7

2.63Let V be a K-vector space, U a subspace of V, and T an arbitrary K-basis of U. Show that there is a K-basis of V, that contains T. [H]
2.64
  1. Let V be a K-vector space, and U1, U2 subspaces of V. Show that the set is a K-subspace of V. If U1U2 = {0}, we say that U is the direct sum of U1 and U2 and write U = U1U2.

  2. Let V be a K-vector space and W a subspace of V. Show that there exists a subspace W′ of V such that V = WW′. This space W′ is called the complement subspace of W in V. [H]

2.65Let V and W be K-vector spaces and f : VW a K-linear map. Show that f is uniquely determined by the images f(x), , where S is a basis of V.
2.66Let V and W be K-vector spaces. Check that HomK(V, W) is a vector space over K. Show that dimK(HomK(V, W)) = (dimK V)(dimK W). In particular, if W = K, then HomK(V, K) is isomorphic to V. The space HomK(V, K) is called the dual space of V.
2.67Let V and W be m- and n-dimensional K-vector spaces, S = {x1, . . . , xm} a K-basis of V, T = {y1, . . . , yn} a K-basis of W, and f : VW a K-linear map. For each i = 1, . . . , m, write f(xi) = ai1y1 + · · · + ainyn, . The m × n matrix is called the transformation matrix of f (with respect to the bases S and T). We have:

Let V1, V2, V3 be K-vector spaces, f, f1, , and . Prove the following assertions:

  1. .

  2. .

  3. f is invertible (as a map) if and only if is invertible (as a matrix).

(Remark: This exercise explains that the linear transformations of finite-dimensional vector spaces can be explained in terms of matrices.)

2.68Show that for every there are integers a1, . . . , an that constitute a minimal set of generators for the unit ideal in . [H]
2.69Let M be an R-module. A subset S of M is called a basis of M, if S generates M and is linearly independent over R in the sense that , , , , implies a1 = · · · = an = 0. Show that M has a basis if and only if M is a free R-module.
2.70We define the rank of a finitely generated R-module M as

RankR M := min{#S | M is generated by S}.

If N is a submodule of M, show that RankR M ≤ RankR N + RankR(M/N). Give an example where the strict inequality holds.

2.71Let M be an R-module. An element is called a torsion element of M, if Ann Rx ≠ 0, that is, if there is with ax = 0. The set of all torsion elements of M is denoted by Tors M. M is called torsion-free if Tors M = {0}, and a torsion module if Tors M = M.
  1. Show that Tors M is a submodule of M.

  2. Show that Tors M is a torsion module (called the torsion submodule of M) and that the module M/Tors M is torsion-free.

  3. If R is an integral domain, show that every free module over R is torsion-free. In particular, every vector space is torsion-free.

2.72Show that:
  1. is not finitely generated as a -module. [H]

  2. is not a free -module. [H]

  3. is a torsion-free -module.

This shows that the converse of Exercise 2.71(c) is not true in general.

2.8. Fields

In this section, we study some important properties of field extensions. We also give an introduction to Galois theory. Unless otherwise stated, the letters F, K and L stand for fields in this section.

2.8.1. Properties of Field Extensions

We have seen that if FK is a field extension, then K is a vector space over F. This observation leads to the following very useful definitions.

Definition 2.54.

For a field extension FK, the cardinality of any F-basis of K is called the degree of the extension FK and is denoted by [K : F]. If [K : F] is finite, K is called a finite extension of F. Otherwise, K is called an infinite extension of F.

Proposition 2.30.

Let FKL be a tower of field extensions. Then [L : F] = [L : K] [K : F]. In particular, the extension FL is finite if and only if the extensions FK and KL are finite. In that case, [L : K] | [L : F] and [K : F] | [L : F].

Proof

One can easily check that if S is an F-basis of K and S′ a K-basis of L, then the set is an F-basis of L.

Recall the definitions of the rings F[X] of polynomials and F(X) of rational functions in one indeterminate X. These notations are now generalized. For a field extension FK and for , we define:

and

Equation 2.3


It is easy to see that F[a] is the smallest (with respect to inclusion) of the integral domains that contain F and a. Similarly F(a) is the smallest of the fields that contain F and a. We also have F[a] ⊆ F(a). Now we state the following important characterization of algebraic elements.

Theorem 2.33.

For a field extension FK and an element , the following conditions are equivalent:

  1. The element a is algebraic over F.

  2. The extension F(a) is finite over F.

  3. F(a) = F[a].

Proof

[(a)⇒(b)] Let be of degree d. Consider the ring homomorphism that takes f(X) ↦ f(a). From Proposition 2.28, Ker , and by the isomorphism theorem . Since h is irreducible over F, F[X]/〈h〉 and so Im are fields. Since Im contains F and a (note that ), we have , that is, . Finally, notice that [F[X]/〈h〉 : F] = d.

[(b)⇒(c)] Let d := [F(a) : F]. Since the elements 1, a, a2, . . . , ad are linearly dependent over F, there exists , not all 0, such that α0 + α1a + · · · + αdad = 0. This, in turn, implies that there is an irreducible polynomial with h(a) = 0. Now consider . Clearly, hg (because otherwise g(a) = 0). Since h is irreducible, gcd(g, h) = 1, that is, there exist polynomials u(X), with u(X)g(X) + v(X)h(X) = 1, that is, with u(a)g(a) = 1. But then .

[(c)⇒(a)] Clearly, the element 0 is algebraic over F. So assume a ≠ 0. Since , by hypothesis there is a polynomial such that 1/a = f(a). But then a is a root of the non-constant polynomial .

Corollary 2.10.

For a field extension FK, the set of elements in K that are algebraic over F is a field.

Proof

It is sufficient to show that if a, are algebraic over F, then the elements a ± b, ab and a/b (if b ≠ 0) are also algebraic over F. By Theorem 2.33, [F(a) : F] is finite. Since b is algebraic over F, it is also algebraic over F(a). In particular, [F(a)(b) : F(a)] is finite. But then the extension F(a)(b) is also finite over F and contains a ± b, ab and a/b (if b ≠ 0).

The field F(a)(b) in the proof of the last corollary is also denoted as F(a, b). It is the smallest subfield of K that contains F, a and b, and it follows that F(a, b) = F(b, a). More generally, for a field extension FK and for , each algebraic over F, the field F(a1, . . . , an) is defined as F(a1)(a2) . . . (an) and is independent of the order in which ai are adjoined.

Corollary 2.11.

Let FK be a finite extension. Then K is algebraic over F.

Proof

For any , we have FF(a) ⊆ K. Now use Proposition 2.30.

The converse of the last corollary is not true, that is, it is possible that an algebraic extension has infinite extension degree. Exercise 2.59 gives an example.

Corollary 2.12.

If FK and KL are algebraic field extensions, then FL is also algebraic.

Proof

Take an arbitrary . Since KL is algebraic, there is a non-zero polynomial such that f(a) = 0. It then follows that a is algebraic over F0, . . . , αn). Since each αi is algebraic over F, the degree [F0, . . . , αn) : F] is finite. Therefore, [F0, . . . , αn)(a) : F] = [F0, . . . , αn)(a) : F0, . . . , αn)] [F0, . . . , αn) : F] is also finite and hence F0, . . . , αn)(a) and, in particular, a are algebraic over F.

Definition 2.55.

A field extension FK is called simple, if K = F(a) for some .

Proposition 2.31.

Let F be a field of characteristic 0 and let a, b (belonging to some extension of F) be algebraic over F. Then the extension F(a, b) of F is simple.

Proof

Let p(X) and q(X) be the minimal polynomials (over F) of a and b respectively. Let d := deg p and d′ := deg q. The polynomials p and q are irreducible over F and hence by Exercise 2.61 have no multiple roots. Let a1, . . . , ad be the roots of p and b1, . . . , bd the roots of q with a = a1 and b = b1. For each i, j with j ≠ 1, the equation ai + λbj = a + λb has a unique solution for λ (not necessarily in F). Since F is infinite, we can choose which is not a solution of any of the equations just mentioned. Define c := a + μb, so that cai + μbj for all i, j with j ≠ 1. Clearly, F(c) ⊆ F(a, b). To prove the reverse inclusion, note that by hypothesis q(b) = 0. Also if we define , we see that f(b) = p(a) = 0. By the choice of c, we have f(bj) ≠ 0 for j ≠ 1. Finally since q is square-free, we have . This implies that and so too.

Corollary 2.13.

A finite extension FK of fields of characteristic 0 is simple.

Proof

We proceed by induction on d := [K : F]. The result vacuously holds for d = 1. So let us assume that d > 1 and that the result holds for all smaller values of d. Choose an element . Then [F(a) : F] > 1 and divides d. If [F(a) : F] = d, we are done. So assume [F(a) : F] < d. Since [K : F(a)] < d, by the induction hypothesis the extension F(a) ⊆ K is simple, say K = F(a)(b) = F(a, b). The result now follows immediately from the previous proposition.

2.8.2. Splitting Fields and Algebraic Closure

Let f(X) be a non-constant polynomial of degree d in F[X]. Assume that f does nor split over F. Consider an irreducible (in F[X]) factor f′ of f of degree d′ > 1. F′ := F[X]/〈f′〉 is a field extension of F. Furthermore, if , the elements 1, constitute a basis of F′ over F. In particular, [F′ : F] = d′ ≤ d. Now, one can write f(X) = (X – α1)g(X) for some . If g splits over F′, so does f too. Otherwise, choose any irreducible (in F′[X]) factor g′ of g with deg g′ > 1 and consider the field extension F″ := F′[X]/〈g′〉. Then [F″ : F′] = deg g′ ≤ deg g = d – 1, so that [F″ : F] ≤ d(d – 1). Moreover, if , then f(X) = (X –α1)(X –α2)h(X) for some . Proceeding in this way we get:

Proposition 2.32.

For a polynomial of degree d ≥ 1, there is a field extension K of F with [K : F] ≤ d!, such that f splits over K.

We now establish the uniqueness of the splitting field of a polynomial . To start with, we set up certain notations. An isomorphism μ : FF′ of fields induces an isomorphism μ* : F[X] → F′[Y] of polynomial rings, defined by adXd+ad–1Xd–1 + · · · + a0 ↦ μ(ad)Yd + μ(ad–1)Yd–1 + · · · + μ(a0). We have μ*(a) = μ(a) for all . Note also that is irreducible over F if and only if is irreducible over F′. With these notations we state the following important lemma.

Lemma 2.5.

Let the non-constant polynomial be irreducible over F. Let α and β be roots of f and μ*(f) respectively. Then there is an isomorphism τ : F (α) → F′(β) of fields such that τ(a) = μ(a) for all and τ(α) = β.

Proof

Since F(α) = F[α] and F′(β) = F′[β], we can define the map τ : F[α] → F′[β] by g(α) ↦ (μ*(g))(β) for each . It is now an easy check that τ is a well-defined isomorphism of fields with the desired properties.

Roots of an irreducible polynomial are called conjugates (of each other). If α and β are two roots of an irreducible polynomial , the last lemma guarantees the existence of an isomorphism τ : F(α) → F(β) that fixes all the elements of F and that maps αβ.

Proposition 2.33.

We use the maps μ : FF′ and μ* : F[X] → F′[Y] as defined above. Let be a non-constant polynomial and let K and K′ be splitting fields of f and μ*(f) over F and F′ respectively. Then there is an isomorphism τ : KK′ of fields, such that τ(a) = μ(a) for all .

Proof

We proceed by induction on n := [K : F]. (By Proposition 2.32 n is finite.) If n = 1, then K = F, that is, the polynomial f splits over F itself and so does μ*(f) over F′, that is, K′ = F′. Thus τ = μ is the desired isomorphism.

Now assume that n > 1 and that the result holds for all fields L and for all polynomials in L[X] with splitting fields (over L) of extension degrees less than n. Consider an irreducible factor g of f with 1 < deg g ≤ deg f. Note that g also splits over K. We take any root of g and consider the tower of field extensions FF(α) ⊆ K. Similarly, let be a root of μ*(g) and consider F′ ⊆ F′(β) ⊆ K′. By Lemma 2.5 there is an isomorphism ν : F(α) → F′(β) with ν(a) = μ(a) for all and ν(α) = β. Now [K : F(α)] = [K : F]/[F (α) : F] = [K : F ]/deg g < n. It is evident that K and K′ are splitting fields of f and μ*(f) over F(α) and F′(β) respectively. Hence by the induction hypothesis there is an isomorphism τ : KK′ with τ(a) = ν(a) for all . In particular, τ(a) = μ(a) for all .

The results pertaining to the splitting field of a polynomial can be generalized in the following way. Let S be a non-empty subset of F[X]. A splitting field of S over F is a minimal field K containing F such that each polynomial splits in K. If S = {f1, . . . , fr} is a finite set, the splitting field of S is the same as the splitting field of f = f1 · · · fr (Exercise 2.57). But the situation is different, if S is infinite. Of particular interest is the set S consisting of all irreducible polynomials in F[X]. In this case, the splitting field of S is an algebraic closure of F.

We give a sketch of the proof that even when S is infinite, a splitting field for S can be constructed. This, in particular, establishes the existence of an algebraic closure of any field. We may assume that S comprises non-constant polynomials only. For each , we define an indeterminate Xf and consider the ring and the ideal of A generated by f(Xf) for all . We have and, therefore, there is a maximal ideal of A containing (Exercise 2.23). Consider the field F1 := A/m containing F. Every polynomial contains at least one root in F1. Now we replace F by F1 and as above get another field F2 containing F1 (and hence F), such that every polynomial in S (of degree ≥ 2) has at least two roots in F2. We continue this procedure (infinitely often, if necessary) and obtain a sequence of fields FF1F2F3 ⊆ · · ·. Define K to be the field consisting of all elements of , that are algebraic over F. Each polynomial in S splits in K, but in no proper subfield of K, that is, K is a splitting field of S.

It turns out that the splitting field of S is unique up to isomorphisms that fix elements of F. In particular, the algebraic closure of F is unique up to isomorphisms that fix elements of F, and is denoted by .

*2.8.3. Elements of Galois Theory

For a field K, the set Aut K of all automorphisms of K is a group under (functional) composition. We extend this concept now. Let FK be an extension of fields.

Definition 2.56.

An automorphism is called an F-automorphism of K, if fixes all the elements of F(which means that for all ). The set of all F-automorphisms of K is denoted by AutF K or by Gal(K|F) and is a subgroup of Aut K. The Galois group of a polynomial is defined to be the group AutF K, where K is the splitting field of f over F.

Conversely, for a subgroup H of AutF K the set of elements of K that are fixed by all the automorphisms of H, that is, the set of all with for every , is a subfield of K, called the fixed field of H (over F) and denoted as FixF H. Clearly, F ⊆ FixF HK.

For every intermediate field L (that is, a field L with FLK), we have a subgroup AutL K of AutF K. Conversely, given a subgroup H of AutF K we have the intermediate fixed field FixF H. It is a relevant question to ask if there is any relationship between the subgroups of AutF K and the intermediate fields. A nice correspondence exists for a particular type of extensions that we define now.

Definition 2.57.

A field extension FK is said to be a Galois extension (or K is said to be a Galois extension over F), if FixF (AutF K) = F. Thus K is Galois over F if and only if for every there is a with .

Example 2.18.

Let K be the splitting field of a non-constant polynomial . By Exercise 2.77, the extension FK is normal. Assume that FK is a separable extension (Exercise 2.75). Consider an element and let g be the minimal polynomial of α over F. Then deg g > 1 and g splits in K[X]. By assumption (of separability), there is a root of g with βα. Lemma 2.5 shows that there is a such that τ(α) = β. Thus, K is Galois over F. In particular, if char F = 0 or if , then FK is separable and so Galois. For example, is a Galois extension of .

The following theorem establishes the correspondence we are looking for.

Theorem 2.34. Fundamental theorem of Galois theory

For a finite Galois extension FK, there is a bijective correspondence between the set of all intermediate fields and the set of all subgroups of AutF K (given by L ↦ AutL K and H ↦ FixF H) such that the following assertions hold:

  1. AutFixF H K = H for every subgroup H of AutF K.

  2. FixF (AutL K) = L for every field L with FLK.

  3. For field extensions FLL′ ⊆ K, the extension degree [L′ : L] is the same as the index [AutL K : AutL K]. In particular, the order of AutF K is [K : F].

  4. For every intermediate field L, one has:

  1. K is Galois over L.

  2. L is Galois over F if and only if AutL K is a normal subgroup of AutF K. In this case, AutF L ≅ AutF K/AutL K.

A proof of this theorem is rather long and uses many auxiliary results which we would not need otherwise. We, therefore, choose to omit the proof here.

Exercise Set 2.8

2.73Let α be transcendental over F. Show that the domain F[α] and the field F(α) are respectively isomorphic to the polynomial ring F[X] and the field F(X) of rational functions in one indeterminate X. Generalize the result for an arbitrary family αi, , of elements each of which is transcendental over F.
2.74Let FK be a field extension and let be an endomorphism of K with for every .
  1. If a non-constant polynomial has a root , show that is also a root of f. For example, if , and is the automorphism mapping z to its (complex) conjugate , then we conclude that if a complex number z is a root of , then is also a root of f. A similar result holds for the extension , where m is a non-square rational number.

  2. If K is algebraic over F, show that is an automorphism. [H]

2.75Let FK be a field extension.
  1. An irreducible polynomial is said to be separable over F, if f has no multiple roots. An algebraic element is said to be separable over F, if the minimal polynomial of α over F is separable. K is called a separable extension of F, if every element of K is (algebraic and) separable over F. Show that if char F = 0 or if , and if K is an algebraic extension of F, then K is separable over F · [H]

  2. An algebraic element is called purely inseparable over F, if the minimal polynomial of α over F factors in K[X] as (Xα)n for some . If every element of K is (algebraic and) purely inseparable over F, then K is called a purely inseparable extension of F. Show that is both separable and purely inseparable if and only if . Thus, if char F = 0 or , then F has no purely inseparable extension other than itself.

  3. If p := char F > 0, then an element is purely inseparable over K if and only if minpolyα,F(X) = Xpr + a for some r ≥ 0 and . In particular, show that if K is a finite purely inseparable extension of F, then [K : F ] = ps for some s ≥ 0.

2.76F is called a perfect field, if every irreducible polynomial in F[X] is separable over F.
  1. Show that F is a perfect field if and only if every algebraic extension of F is separable over F. In particular, the fields of characteristic 0 and the fields , , are perfect.

  2. Let p := char F > 0. Show that F is perfect if and only if every element of F has a p-th root in F. [H]

2.77A field extension FK is called normal, if every irreducible polynomial in F[X], that has a root in K, splits in K[X].
  1. If K is the splitting field of a polynomial over F, show that K is a normal extension of F. [H]

  2. If [K : F] = 2, show that FK is a normal extension.

  3. Consider the tower of field extensions to conclude that if FK and KL are normal extensions, then FL need not be normal.

2.78Prove the following assertions:
  1. is an infinite extension of . [H]

  2. . [H]

2.79Let FK be a field extension and let L be the fixed field of AutF K over F. Show that K is a Galois extension of L.

2.9. Finite Fields

Finite fields are seemingly the most important types of fields used in cryptography. They enjoy certain nice properties that infinite fields (in particular, the well-known fields like , and ) do not. We concentrate on some properties of finite fields in this section. As we see later, arithmetic over a finite field K is fast, when char K = 2 or when #K is a prime. As a result, these two classes of fields are the most common ones employed in cryptography. However, in this section, we do not restrict ourselves to these specific fields only, but provide a general treatment valid for all finite fields. As in the previous section, we continue to use the letters F, K, L to denote fields. In addition, we use the letter p to denote a prime number and q a power of p: that is, q = pn for some .

2.9.1. Existence and Uniqueness of Finite Fields

Let K be a finite field of cardinality q. Then p := char K > 0. By Proposition 2.7, p is a prime, that is, K contains an isomorphic copy of the field . If , we have q = pn. Therefore, we have proved the first statement of the following important result.

Theorem 2.35.

The cardinality of a finite field is a power pn, , of a prime number p. Conversely, given and , there exists a finite field of cardinality pn.

Proof

In order to construct a finite field of cardinality q := pn, we start with and consider the splitting field K of the polynomial . Since f′(X) = –1 ≠ 0, the roots of f are distinct (Exercise 2.61). Therefore, the set has cardinality q. By Exercise 2.80, E is a field. Since FEK and f splits over E, by definition of splitting fields, we have K = E, that is, #K = #E = q.

Theorem 2.36. Fermat’s little theorem for finite fields

Let K be a finite field of cardinality q. Then every satisfies aq = a.

Proof

Clearly, 0q = 0. Take a ≠ 0. K* being a group of order q – 1, by Proposition 2.4 ordK* (a) divides q – 1. In particular, aq–1 = 1, that is, aq = a.

Theorem 2.37.

Let K be a finite field of cardinality q = pn and let F be the subfield of K isomorphic to . Then K is the splitting field of the polynomial over F. In particular, K is unique up to F -isomorphisms (that is, isomorphisms fixing elements of F).

Proof

By Theorem 2.37, each of the q elements of K is a root of f and consequently K is the splitting field of f. The last assertion in the theorem follows from the uniqueness of splitting fields (Proposition 2.33).

This uniqueness allows us to talk about the finite field of cardinality q (rather than a finite field of cardinality q). We denote this (unique) field by .

The results proved so far can be generalized for arbitrary extensions , where q = pn, n, . We leave the details to the reader (Exercise 2.82). It is important to point out here that since is the splitting field of XqmX over , by Exercise 2.77 we have:

Corollary 2.14.

Every finite extension of finite fields is normal.

This implies that an irreducible polynomial has either none or all of its roots in . Also if with q = pn, then αq = αpn = α. Therefore, αpn–1 is a p-th root of α. By Exercise 2.76(b), we then conclude:

Corollary 2.15.

Every finite field is perfect.

Proposition 2.34.

Consider the extension , . There is a unique intermediate field with qd elements, , if and only if d|m. Furthermore, if d|m, then belongs to the (unique intermediate) field if and only if αqd = α.

Proof

For d|m, we have (XqdX)|(XqmX). The qd roots of XqdX in K constitute an intermediate field L. If L′ ≠ L is another intermediate field with qd elements, by Theorem 2.36 there are more than qd elements of K, that are roots of XqdX, a contradiction. Conversely, an intermediate field L contains qd elements, where . Since , we have d|m. The last assertion in the proposition follows immediately from the above argument.

Corollary 2.16.

Let and . Then deg f divides m.

Proof

Consider the extension of , where d := deg f, and the fact that is a normal extension.

Now we will prove a very important result concerning the multiplicative group .

Theorem 2.38.

is a cyclic group for any finite field .

Proof

Modify the proof of Proposition 2.19 or use the following more general result.

Theorem 2.39.

Let K be a field (not necessarily finite). Then any finite subgroup G of the multiplicative group K* is cyclic.

Proof

Since K is a field, for any the polynomial Xn – 1 has at most n roots in K and hence in G. The theorem then follows immediately from Exercise 2.18.

Corollary 2.17.

Every finite extension is simple. In particular, contains an irreducible polynomial of degree m (for any q and m).

Proof

Let α be a generator of the cyclic group . Then, m is the smallest of the positive integers s for which αqs = α. Let with d := deg f, so that . If d < m, then αqd = α, a contradiction. Thus d = m, that is, .

2.9.2. Polynomials over Finite Fields

In this section, we study some useful properties of polynomials over finite fields. We concentrate on polynomials in for an arbitrary q = pn, , . We have seen how the polynomials XqmX proved to be important for understanding the structures of finite fields. But that is not all; these polynomials indeed have further roles to play. This prompts us to reserve the following special symbol: .

Let be a finite extension of finite fields and let be a root of the polynomial . Since each , we have . Therefore, . More generally, for each r = 0, 1, 2, · · · the element is a root of f(X). This gives us a nice procedure for computing the minimal polynomial of α as the following corollary suggests.

Corollary 2.18.

The minimal polynomial of over is (Xα)(Xαq) · · · (Xαqd–1), where d is the smallest of the integers for which αqs = α.

Proof

Let have degree δ. So is the smallest field containing ( and) α and hence all the roots of fα, that is, αqs = α for s = δ and for no smaller positive integer values of s. Therefore, δ = d and all the conjugates of α are precisely α, αq, . . . , αqd–1.

We now prove a theorem which has important consequences.

Theorem 2.40.

is the product of all monic irreducible polynomials in , whose degrees divide m.

Proof

We have . By Corollary 2.18, the minimal polynomial fα(X) of over divides . By Corollary 2.16, deg fα divides m. Finally, since fα(X) = fβ(X) or gcd(fα(X), fβ(X)) = 1 depending on whether α and β are conjugates or not, is a product of monic irreducible polynomials of , whose degrees divide m. In order to show that is the product of all such polynomials, let us consider an arbitrary polynomial which is monic and irreducible over and has degree d|m. The polynomial g splits over (with no multiple roots, finite fields being perfect). Since d|m, by Proposition 2.34 . Thus g splits over as well and, in particular, divides .

The first consequence of Theorem 2.40 is that it leads to a procedure for checking the irreducibility of a polynomial . Let d := deg f. If f(X) is reducible, it admits an irreducible factor of degree ≤ ⌊d/2⌋. Since is the product of all distinct irreducible factors of f with degrees dividing m, we compute the gcds g1, . . . , gd/2⌋. If all these gcds are 1, we conclude that f is irreducible. Otherwise f is reducible. We will see an optimized implementation of this procedure in Chapter 3. Besides irreducibility testing, the above theorem also leads to algorithms for finding random irreducible polynomials and for factorizing polynomials, as we will also discuss in Chapter 3.

The second consequence of Theorem 2.40 is that it gives us a formula for calculating the number of monic irreducible polynomials of a given degree over a given field. First we need to define a function on .

Definition 2.58.

The Möbius function is defined as

It follows that μ(n) ≠ 0 if and only if n is square-free.

Lemma 2.6.

For , we have

where denotes summation over all positive divisors d of n.

Proof

The result follows immediately for n = 1. For n > 1, write , where p1, . . . , pr are r ≥ 1 distinct primes and . The only non-zero terms in the sum are those corresponding to d = pi1 · · · pis for pairwise distinct choices of . From definition, it then follows that .

Lemma 2.7. Möbius inversion formula

Let f and g be maps from to an Abelian group G.

  1. If G is additive and , then

  2. If G is multiplicative and , then

Proof

To prove the additive formula we note that

where the last equality follows from Lemma 2.6. The multiplicative formula can be proved similarly.

Let us denote by νq,m the number of monic irreducible polynomials in of degree m and by the product of all monic irreducible polynomials in of degree m. By Theorem 2.40, we have and . Applications of the Möbius inversion formula then yield the following formulas:

Equation 2.4


If p1, . . . , pr are all the prime divisors of m (not necessarily all distinct), Equation (2.4) together with the observation that μ(n) ≥ –1 for all imply that But each pi ≥ 2, so that m ≥ 2r, and hence . We, therefore, have an independent proof of the second statement in Corollary 2.17. Moreover, for practical values of q and m we have the good approximation:

Equation 2.5


Since the total number of monic polynomials of degree m in Fq[X] is qm, a randomly chosen monic polynomial in of degree m has an approximate probability of 1/m for being irreducible, that is, one expects to get an irreducible polynomial of degree m, after O(m) random monic polynomials are picked up from . These observations have an important bearing for devising efficient algorithms for finding irreducible polynomials over finite fields. (See Chapter 3.)

The conjugates of over are αqi, . It is interesting to look at the sum and the product of the conjugates of α. By Corollary 2.18, for some . Since , the elements and belong to . Since αqd = α, for any (positive) integral multiple δ of d, the sum and the product are elements of too.

Definition 2.59.

Let , q = pn, be a finite extension of finite fields and let . The trace of α over is defined as the sum

and the norm of α over is defined as

In view of the preceding discussion, the trace and norm of α are elements of . For q = p, the trace and norm of α are also called the absolute trace and the absolute norm of α and are often denoted as and . We often drop the suffix in these notations, when no ambiguities are likely.

The trace and norm functions play an important role in the theory of finite fields. See Exercise 2.86 for some elementary properties of these functions.

2.9.3. Representation of Finite Fields

is a vector space of dimension m over . Let β0, . . . , βm–1 be an -basis of . Each element has a unique representation a = a0β0 + · · · + am–1βm–1 with each . Therefore, if we have a representation of the elements of , we can also represent the elements of . Thus elements of any finite field can be represented, if we have representations of elements of prime fields. But the set {0, 1, . . . , p – 1} under the modulo p arithmetic represents .

So our problem reduces to selecting suitable bases β0, . . . , βm–1 of over . In order to illustrate how we can do that, let us choose a priori a fixed monic irreducible polynomial with deg f = m. We then represent , where α (the residue class of X) is a root of f in . The elements are linearly independent over , since otherwise we would have a non-zero polynomial of degree less than m, of which α is a root. The -basis 1, α, . . . , αm–1 of is called a polynomial basis (with respect to the defining polynomial f). The elements of are then polynomials of degrees < m. The arithmetic in is carried out as the polynomial arithmetic of modulo the irreducible polynomial f.

Example 2.19.
  1. The elements of are 0 and 1 with 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0, 0 · 0 = 1 · 0 = 0 · 1 = 0 and 1 · 1 = 1. In order to represent , we choose the irreducible polynomial . Elements of are a2α2 +a1α+a0, where . In order to demonstrate the arithmetic in , we take . Their sum in is a+b = α+1. On the other hand, ab = α4+α3+α2+α = α(α3+α2+1)+α2 = α.0+α2 = α2. The complete multiplication table for this representation is given in the Table 2.2.

    Table 2.2. Multiplication table for
     01αα + 1α2α2 + 1α2 + αα2 + α + 1
    000000000
    101αα + 1α2α2 + 1α2 + αα2 + α + 1
    α0αα2α2 + αα2 + 1α2 + α + 11α + 1
    α + 10α + 1α2 + αα2 + 11αα2 + α + 1α2
    α20α2α2 + 11α2 + α + 1α + 1αα2 + α
    α2 + 10α2 + 1α2 + α + 1αα + 1α2 + αα21
    α2 + α0α2 + α1α2 + α + 1αα2α + 1α2 + 1
    α2 + α + 10α2 + α + 1α + 1α2α2 + α1α2 + 1α

  2. is represented by the set {0, 1, 2} with arithmetic operations modulo 3. Since –1 is a quadratic non-residue modulo 3, the polynomial X2 + 1 is irreducible over . Therefore, the quotient field can be used to represent , being a root of this polynomial. The multiplication table of under this representation is then as shown in Table 2.3.

    Table 2.3. Multiplication table for
     012ββ + 1β + 22β2β + 12β + 2
    0000000000
    1012ββ + 1β + 22β2β + 12β + 2
    20212β2β + 22β + 1ββ + 2β + 1
    β0β2β2β + 22β + 21β + 12β + 1
    β + 10β + 12β + 2β + 22β12β + 12β
    β + 20β + 22β + 12β + 21ββ + 12β2
    2β02ββ12β + 1β + 122β + 2β + 2
    2β + 102β + 1β + 2β + 122β2β + 2β1
    2β + 202β + 2β + 12β + 1β2β + 212β

Polynomial bases are most common in finite field implementations. Some other types of bases also deserve specific mention in this context.

Definition 2.60.

An element is called a normal element over , if the conjugates α, αq, . . . , αqm–1 are (distinct and) linearly independent over . For a normal element α of over , the -basis α, αq, . . . , αqm–1 is called a normal basis (of over ). If, in addition, α is a primitive element (that is, a generator) of , then α and the corresponding normal basis are called a primitive normal element and a primitive normal basis respectively.

It can be shown that normal bases exist for all finite extensions . It can even be shown that primitive normal bases also do exist for all such extensions.

Example 2.20.

Consider the representation of in Example 2.19. The elements α, α2 and α4 = α2 + α + 1 satisfy

with the 3×3 transformation matrix having determinant 1 modulo 2. Thus α is a normal element of and (α, α2, α4) is a normal basis of . Since is prime, α is a generator of , that is, α is also a primitive normal element of .

On the other hand, α + 1 is not a normal element of . Table 2.2 gives

with the transformation matrix having determinant zero modulo 2.

Computations over finite fields often call for exponentiations of elements a = a0β0 + · · · + am–1βm–1. If βi = αqi, i = 0, . . . , m – 1, construct a normal basis, , since αqm = α and for each i. Thus the coefficients of aq (in the representation under the given normal basis) is obtained simply by cyclically shifting the coefficients a0, . . . , am–1 in the representation of a. This leads to a considerable saving of time. In particular, this trick becomes most meaningful for q = 2 (a case of high importance in cryptography).

Now that exponentiations become cheaper with normal bases, one should not let the common operations (addition and multiplication) turn significantly slower. The sum of a = a0β0 + · · · + am–1βm–1 and b = b0β0 + · · · + bm–1βm–1 continues to remain as easy as in the case of a polynomial basis, namely, a + b = (a0 + b0)β0 + · · · + (am–1 + bm–1)βm–1, where each ai + bi is calculated in . However, computing the product ab introduces difficulty. In particular, it requires the representation of βiβj, 0 ≤ i, jm – 1, in the basis β0, . . . , βm–1, say, . For ij, we have . It is thus sufficient to look only at the coefficients , 0 ≤ j, km – 1. We denote by Cα the number of non-zero . From practical considerations (for example, for hardware implementations), Cα should be as small as possible. For q = 2, one can show that 2m – 1 ≤ Cαm2. If, for this special case, Cα = 2m – 1, the normal basis α, αq, . . . , αqm–1 is called an optimal normal basis. Unlike normal (or primitive normal) bases, optimal normal bases do not exist for all values of .

We finally mention another representation of elements of a finite field , that does not depend on the vector space representation discussed so far, but which is based on the fact that the group is cyclic. If we are given a primitive element (that is, a generator) γ of , then the elements of are 0, 1 = γ0, γ, . . . , γq–2. Multiplication and exponentiation become easy with this representation, since 0 · a = 0 for all , whereas γi · γj = γk with ki + j (mod q – 1). Unfortunately, this representation provides no clue on how to compute γi + γj. One possibility is to store a table consisting of the values zk satisfying 1 + γk = γzk for all k = 0, . . . , q – 2 (with γk ≠ –1), so that for ij one can compute γi + γj = γi(1 + γji) = γiγzji = γl, where li + zji (mod q – 1). Such a table is called Zech’s logarithm table, can be maintained for small values of q and may facilitate computations in extensions . But if q is large (or more correctly if p is large, where q = pn), this representation of elements of is not practical nor often feasible. Another difficulty of this representation is that it calls for a primitive element γ. If q is large and the integer factorization of q – 1 is not provided, there are no efficient methods known for finding such an element or even for checking if a given element is primitive.

Example 2.21.

Consider the representation of in Example 2.19. By Table 2.3, γ := β + 1 is a generator of . Table 2.4 lists the powers of γ and the Zech logarithms.

Table 2.4. Zech’s logarithm table for with respect to γ = β + 1
kγk1 + γkzk
0124
1β + 1β + 27
22β2β + 13
32β + 12β + 25
420
52β + 22β2
6ββ + 11
7β + 2β6

Exercise Set 2.9

2.80Let F be a field (not necessarily finite) of characteristic and let a, . Prove that (a + b)p = ap + bp, or, more generally, (a + b)pn = apn + bpn for all . [H]
2.81Let , and q := pn. Prove that:
  1. If , then f(Xp) = f(X)p.

  2. If , then f(Xp) = g(X)p for some .

2.82Let , n, and q := pn. Let FK be an extension of finite fields with #F = q and #K = qm. Show that K is the splitting field of over . [H]
2.83Write the addition and multiplication tables of (some representations of) the fields and . Use these tables to find a primitive element in each of these fields and a normal element in (over ).
2.84Let K be a field (not necessarily finite or of positive characteristic).
  1. Let be of degree 2 or 3. Prove that f is reducible in K[X] if and only if f has a root in K. Deduce that X2 + X + 1 and X3 + X + 1 are irreducible in .

  2. Let be of degree d ≥ 0. The opposite of f is the polynomial . Show that f(X) is irreducible in K[X] if and only if fop(X) is irreducible in K[X]. Deduce that X3 + X2 + 1 is irreducible in .

2.85In this exercise, one studies the arithmetic in the finite field .
  1. Show that is irreducible.

  2. Let us represent as . Call and consider the elements a := 3α2 + 2α + 1 and b := 2α2 + 3 in . Compute ab–1 in this representation of . You should compute the canonical representative of ab–1 in , that is, a polynomial in α of degree < 3 with coefficients reduced modulo 5.

2.86Let FKL be finite extensions of finite fields with [L : K] = s. Let α, and . Prove the following assertions:
  1. TrK|F(α + β) = TrK|F(α) + TrK|F (β) and NK|F (αβ) = NK|F (α) NK|F (β).

  2. TrL|F (α) = s TrK|F (α) and NL|F (α) = NK|F (α)s.

  3. Transitivity of trace and norm

    TrL|F (γ) = TrK|F (TrL|K(γ)) and NL|F (γ) = NK|F (NL|K (γ)).

2.87Let be a finite extension of finite fields. In this exercise, we treat both K and L as vector spaces over K. Show that:
  1. TrL|K is a surjective linear transformation LK.

  2. All the linear transformations LK are given by Tα : LK, β ↦ TrL|K(αβ), where . (In this notation, TrL|K = T1.) Moreover, for distinct elements α, the linear transformations Tα and Tα are distinct.

2.88Let K and L be as in Exercise 2.87 and let . Show that TrL|K(β) = 0 if and only if β = γq – γ for some .
2.89Let K and L be as in Exercise 2.87. Two K-bases (β0, . . . , βm–1) and (γ0, . . . , γm–1) of L are called dual or complementary, if TrL|K(βiγj) = δij.[10] Show that every K-basis of L has a unique dual basis.

[10] The Kronecker delta δ on an index set I (finite or infinite) is defined for i, as:

2.90Prove that every finite extension of finite fields is Galois. [H]
2.91For the extension , consider the map , ααq.
  1. Show that is an -automorphism of . is called the Frobenius automorphism of over .

  2. Show that is cyclic of order m and with as a generator. [H]

2.92Let be irreducible with deg f = d. Consider the extension and let r := gcd(d, m).
  1. Show that f is irreducible in if and only if r = 1. [H]

  2. More generally, show that f factors in into a product of r irreducible polynomials each of degree d/r.

2.93Consider the representation of in Example 2.19. Construct the minimal polynomials over of the elements of . [H]
2.94Show that the number of (ordered) -bases of is

(qm – 1)(qmq)(qmq2) · · ·(qmqm – 1).

*2.10. Affine and Projective Curves

In this section, we introduce some elementary concepts from algebraic geometry, which facilitate the treatment of elliptic and hyperelliptic curves in the next two sections. We concentrate only on plane curves, because these are the only curves we need in this book. Throughout this section, K denotes a field (finite or infinite) and the algebraic closure of K.

2.10.1. Plane Curves

The solutions of a polynomial equation f(X, Y) = 0 is one of the central objects of study in algebraic geometry. For example, we know that in the equation X2 + Y2 – 1 = 0 represents a circle with origin at (0, 0) and with radius 1. When we pass to an arbitrary field, it is often not possible to visualize such plots, but it still makes sense to talk about the set of solutions of such an equation. For example, the solutions of the above circle in are the four discrete points (0, 1), (0, 2), (1, 0) and (2, 0). (This solution set does not really look round.)

One can generalize this study by considering polynomials in n indeterminates and by investigating the simultaneous solutions of m polynomials. We, however, do not intend to be so general here and concentrate only on curves defined by a single polynomial equation in two indeterminates.

Definition 2.61.

For , the n-dimensional affine space over K is defined to be the set consisting of all n-tuples (x1, . . . , xn) with each . For n = 2, the affine space is also called the affine plane over K. For a point , the elements are called the affine coordinates of P. The affine space over the closure is often abbreviated as , when the field K is understood from the context.

is an n-dimensional vector space over K. For example, the affine plane can be identified with the conventional X-Y plane.

Definition 2.62.

An affine plane (algebraic) curve C over K is defined by a polynomial and is written as C : f(X, Y) = 0. The set C(K) of K-rational points on an affine plane curve C : f(X, Y) = 0 is the set of all points satisfying f(x, y) = 0.

K-rational points on a plane curve are precisely the solutions of the defining polynomial equation. Standard examples of affine plane curves include the straight lines given by aX + bY + c = 0, a, , not both 0, and the conic sections (circles, ellipses, parabolas and hyperbolas) given by aX2 + bXY + cY2 + dX + eY + f = 0, a, b, c, d, e, with at least one of a, b, c non-zero. For , the set of K-rational points can be drawn as a graph of the polynomial equation, whereas for an arbitrary field K (in particular, for finite fields) such drawings make little or no sense. However, it is often helpful to visualize curves as curves over (also called real curves) and then generalize the situation to an arbitrary field K.

The number ∞ is not treated as a real number (or integer or natural number). But it is often helpful to extend the definition of by including two points that are infinitely far away from the origin, one in each direction. This gives us the so-called extended real line . An immediate advantage of such a completion of is that every Cauchy sequence converges in . But for studying the roots of polynomial equations it is helpful to add only a single point at infinity to in order to get what is called the projective line over . Similarly, if we start with the affine plane and add a point at infinity for each slope of straight lines Y = aX + b and one more for the lines X = c, we get the so-called projective plane over . We also call the line passing through all the points at infinity in to be the line at infinity. An immediate benefit of passing from to is that in any two distinct lines (parallel or not in ) meet at exactly one point and through any two distinct points of passes a unique line.

Now it is time to replace by an arbitrary field K and rephrase our definitions in such a way that it continues to make sense to talk about points and line at infinity, even when K itself contains only finitely many points.

Definition 2.63.

Let . Define the relation ~ on the ‘punctured’ n + 1-dimensional affine space over K by (x0, . . . , xn) ~ (y0, . . . , yn) if and only if there exists a such that yi = λxi for all i = 0, . . . , n. It is easy to see that ~ is an equivalence relation on . The set of all equivalence classes of ~ is called the n-dimensional projective space over K. In particular, is called the projective plane over K. A point is the equivalence class of a point . The elements constitute a set of homogeneous coordinates for P.

It is evident that can be identified with the set of all 1-dimensional vector subspaces (that is lines) of the affine space . To argue that this formal definition tallies with the intuitive notion for n = 2 and , consider the affine 3-space referred to by the coordinates X, Y, Z. Look at the family of planes , parallel to the X-Y plane. (ε0 is the X-Y plane itself.) First take a non-zero value of λ, say λ = 1. Every line in passing through the origin and not parallel to the X-Y plane meets ε1 exactly at one point. Conversely, a unique line passes through each point on ε1 and the origin. In this way, we associate points of with points on ε1. These are all the finite points of . On the other hand, the lines passing through the origin and lying in the X-Y plane (ε0 : Z = 0) do not meet ε1 and correspond to the points at infinity of .

In the last paragraph, we obtained the canonical embedding of the affine plane in by setting Z = 1. By definition, is symmetric in X, Y and Z. This means that we can as well set X = 1 or Y = 1 and see that there are other embeddings of in . This observation often proves to be useful (for example, see Definition 2.66).

Now that we have passed from the affine plane to the projective plane, we should be able to carry (affine) plane curves to the projective plane. For this, we need some definitions.

Definition 2.64.

Let R denote the polynomial ring K[X0, X1, . . . , Xn] over a field K. A monomial of R is an element of R of the form , αi ≥ 0. A term in R is a monomial multiplied by an element . Any polynomial is a sum of finitely many nonzero terms. The degree of a monomial (or a term ) is defined as α0 + α1 + · · · + αn. The degree of a non-zero polynomial , denoted deg f, is defined to be the maximum of the degrees of its non-zero terms. The degree of the zero polynomial is taken to be –∞. A non-zero polynomial is said to be homogeneous of degree d ≥ 0, if all of its non-zero terms have degree d. The zero polynomial is said to be homogeneous of any degree.

Let C : f(X, Y) = 0 be an affine plane curve over a field K defined by a non-zero polynomial and d := deg f. Then f(h)(X, Y, Z) := Zdf(X/Z, Y/Z) is a homogeneous polynomial of degree d in the polynomial ring K[X, Y, Z]. The polynomial f(h) is called the homogenization of f. Putting Z = 1 in f(h)(X, Y, Z) gives back the original polynomial f(X, Y), that is, f(h)(X, Y, 1) = f(X, Y). Therefore, f is called the dehomogenization of the homogeneous polynomial f(h). The homogenization (and dehomogenization) of the zero polynomial is taken to be the zero polynomial.

Take and . By definition, [x, y, z] = [λx, λy, λz]. Since f(h)x, λy, λz) = λdf(h)(x, y, z) = 0 if and only if f(h)(x, y, z) = 0, it makes sense to talk about the zeros of the homogeneous polynomial f(h) in the projective plane . This motivates us to define projective plane curves:

Definition 2.65.

A projective plane curve C over K is defined by a homogeneous polynomial and is written as C : h(X, Y, Z) = 0. The set C(K) of K -rational points on a projective plane curve C : h(X, Y, Z) = 0 is the set of all points such that h(x, y, z) = 0.

Let C : f(X, Y) = 0 be an affine plane curve. The projective plane curve defined by f(h)(X, Y, Z) is by an abuse of notation denoted also by C. The zeros of the affine curve C : f(X, Y) = 0 in are in one-to-one correspondence with the finite zeros of C : f(h)(X, Y, Z) = 0 in (that is, zeros with Z = 1). The projective curve contains some more point(s), namely those at infinity, that can be obtained by putting Z = 0 in f(h)(X, Y, Z). Passage from the affine plane to the projective plane is just that: a systematic inclusion of the points at infinity.

It is often customary to write an affine plane curve as C : f(X, Y) = g(X, Y) and a projective plane curve as C : f(h)(X, Y, Z) = g(h)(X, Y, Z) with f(h) and g(h) of the same degree. The former is the same as the curve C : fg = 0, and the latter the same as C : f(h)g(h) = 0.

A homogeneous polynomial can be viewed as the homogenization of any of the polynomials

fZ(X, Y) = f(X, Y, 1), fY (X, Z) = f(X, 1, Z) and fX(Y, Z) = f(1, Y, Z).

Consider a point P = [a, b, c] on the projective curve C : f(X, Y, Z) = 0. Since a, b and c are not all 0, P is a finite point on at least one of fX, fY and fZ.

2.10.2. Polynomial and Rational Functions on Plane Curves

Throughout the rest of Section 2.10 we make the following assumption:

Assumption 2.1.

K is an algebraically closed field, that is, .

Although many of the results we state now are valid for fields that are not algebraically closed, it is convenient to make this assumption in order to avoid unnecessary complications.

Let C : f(X, Y) = 0 be a curve defined over K. Henceforth we assume that the polynomial f(X, Y) is irreducible over K. Though we write the affine equation for the curve for notational simplicity, we usually work with the set C(K) of the K-rational points on the corresponding projective curve. We refer to the solutions of C in the affine plane as the finite points on the curve.

Definition 2.66.

Let P = [a, b, c] be a point on a curve C defined over K. We call P a smooth or regular or non-singular point of C, if P satisfies the following conditions.

  1. If P is a finite point (that is, if c ≠ 0), then P is called a smooth point on C, if the partial derivatives ∂f/∂X and ∂f/∂Y do not vanish simultaneously at (a/c, b/c).

  2. If P is a point at infinity (that is, if c = 0), then we must have a ≠ 0 or b ≠ 0. Assume a ≠ 0. (The other case can be treated similarly.) Consider the polynomial . P is a finite point on the curve D : g(Y, Z) = 0. P is called a smooth point on C, if (b/a, 0) is a smooth point on D, that is, if ∂g/∂Y and ∂g/∂Z do not vanish simultaneously at (b/a, 0).

A non-smooth point on C is also called non-regular or singular. C is called smooth or regular or non-singular, if all points (finite and infinite) on C are smooth.

Now we define polynomial functions on C. For a moment, we concentrate on the affine curve, that is, only the finite points on C. Let g, with (that is, f|(gh)). Since for any point P on C we have f(P) = 0, it follows that g(P) = h(P). This motivates us to define the following.

Definition 2.67.

The ring K[X, Y]/〈f〉 is called the affine coordinate ring of C and is denoted by K[C]. Elements of K[C] are called polynomial functions on C. If we denote by x and y the residue classes of X and Y respectively in K[C], then a polynomial function on C is given by a polynomial .[11] By our assumption, f is an irreducible polynomial; so 〈f〉 is a prime ideal of K[X, Y], that is, the coordinate ring K[C] is an integral domain.

[11] Recall from Section 2.7 that K[x, y] is the K-algebra generated by x and y. It is not a polynomial algebra (in general).

The quotient field (Exercise 2.34) of K[C] is called the function field of C and is denoted by K(C). An element of K(C) is of the form g(x, y)/h(x, y) with g(x, y), , h(x, y) ≠ 0 (that is, h(X, Y) ∉ 〈f〉), and is called a rational function on C.

By definition, two rational functions are equal if and only if g1(x, y)h2(x, y) – g2(x, y)h1(x, y) = 0 in K[C] or, equivalently, if and only if . We define addition and multiplication of rational functions by the usual rules (Exercise 2.34).

Definition 2.68.

Let P = (a, b) be a finite point on the curve C. Given a polynomial function , the value of g at P is defined to be . If is a rational function, then r is said to be defined at P, if r has a representation r = g/h, g, , with h(P) ≠ 0. In that case, we define the value of r at P to be . If r is not defined at P, it is customary to write r(P) = ∞.

By definition, K[C] and K(C) are collections of equivalence classes. However, the value of a polynomial or a rational function on C is independent of the representatives of the equivalence classes and is, therefore, a well-defined concept.

The above definitions can be extended to the corresponding projective curve C : f(h)(X, Y, Z) = 0. By Exercise 2.96(e), the polynomial f(h) is irreducible, since we assumed f to be so.

Definition 2.69.

The function field (denoted again by K(C)) of the projective curve C is the set of quotients (called rational functions) of the form g(X, Y, Z)/h(X, Y, Z), where g, are homogeneous of the same degree and h ∉ 〈f(h)〉. Two rational functions g1/h1 and g2/h2 are equal if and only if .

A rational function is said to be defined at a point P = [a, b, c] on C, if r has a representation g/h with h(a, b, c) ≠ 0. In that case, we define r(P) := g(a, b, c)/h(a, b, c). Since g and h are homogeneous and of the same degree, the value r(P) is independent of the choice of the projective coordinates of P (Exercise 2.95). If r is not defined at P, we write r(P) = ∞.

One can define polynomial functions on a projective curve (as we did for affine curves), but it makes no sense to talk about the value of such a polynomial function at a point P on the curve, because this value depends on the choice of the homogeneous coordinates of P (Exercise 2.95). This problem is eliminated for a rational function g/h by assuming g and h to be of the same degree.

Definition 2.70.

Let C be a projective plane curve, r be a non-zero rational function and P a point on C. P is called a zero of r if r(P) = 0, and a pole of r if r(P) = ∞.

Now we define the multiplicities of zeros and poles of a rational function or, more generally, the order of any point on a projective plane curve. This is based on the following result, the proof of which is long and difficult, and is omitted.

Theorem 2.41.

Let C be a projective plane curve defined by an irreducible polynomial over K and P a smooth point on C. Then there exists a rational function (depending on P) with the following properties:

  1. uP (P) = 0.

  2. For any non-zero rational function , there exist an integer d and a rational function having neither a zero nor a pole at P such that . The integer d does not depend on the choice of uP.

Definition 2.71.

The function uP of the last theorem is called a uniformizing variable or a uniformizing parameter or simply a uniformizer of C at P. For any non-zero rational function , the integer d with is called the order of r at P and is denoted by ordP (r).

The connection of poles and zeros with orders is established by the following theorem which we again avoid to prove.

Theorem 2.42.

P is neither a pole nor a zero of r if and only if ordP(r) = 0. P is a zero of r if and only if ordP(r) > 0. P is a pole of r if and only if ordP(r) < 0.

If P is a zero (resp. a pole) of r, the integer ordP(r) (resp. – ordP(r)) is called the multiplicity of the zero (resp. pole) P.

Theorem 2.43.

Let r be a rational function on the projective plane curve C defined over K. Then r has finitely many poles and zeros. Furthermore, .

This is one of the theorems that demand K to be algebraically closed. More explicitly, if K is not algebraically closed, any rational function continues to have only finitely many zeros and poles, but the sum of the orders of r at these points is not necessarily equal to 0. Also note that this sum, if taken over only the finite points of C, need not be 0, even when K is algebraically closed.

2.10.3. Maps Between Plane Curves

Now that we know how to define and evaluate rational functions on a curve, we are in a position to define rational maps between two curves. Let C1 : f1(X, Y, Z) = 0 and C2 : f2(X, Y, Z) = 0 be two projective plane curves defined over K by irreducible homogeneous polynomials f1, .

Definition 2.72.

A rational map (defined over K) is given by rational functions , , in K(C1) such that for each at which all of , and are defined, the point . One often uses the notation .

This, however, is not the complete story. A more precise characterization of a rational map is as follows:

A rational map is said to be defined at , if there exists a rational function (depending on P) such that , and are all defined at P, , and are not all zero and . A rational map which is defined at every point of C1(K) is called a morphism.

The curves C1 and C2 are said to be isomorphic (denoted C1C2), if there exist morphisms and ψ : C2C1 such that and are identity maps on C1(K) and C2(K) respectively.

Isomorphism is an equivalence relation on the set of all projective plane curves defined over K. Since two isomorphic curves share many common algebraic and geometric properties, it is of interest in algebraic geometry to study the equivalence classes (rather than the individual curves). If C1C2 and C2 has a simpler representation than C1, then studying the properties of C2 makes our job simpler and at the same time reveals all the common properties of C1. (See Section 2.11 for an example.)

**2.10.4. Divisors on Plane Curves

Let a be a symbol and n a positive integer. We represent by na the formal sum a+···+a (n times). We also define 0a := 0 and –na := n(–a), where the symbol –a satisfies a + (–a) = (–a) + a = 0. For n1, , we define n1a + n2a := (n1 + n2)a. The set under these definitions becomes an Abelian group. If we are given two symbols a, b we can analogously define formal sums na + mb, n, , and the sum of formal sums as (n1a + m1b) + (n2a + m2b) := (n1 + n2)a + (m1 + m2)b. With these definitions the set becomes an Abelian group. These constructions can be generalized as follows:

Definition 2.73.

Given a set (not necessarily finite) of symbols ai, , the set of formal sums of the form , where ni = 0 except for finitely many , is an Abelian group with the addition formula . This group is called the free Abelian group generated by ai, .

Now let ai be the K-rational points on a projective plane curve C defined over K. For notational convenience, we represent by [P] the symbol corresponding to the point P on C. This removes confusions in connection with elliptic curves C (See Section 2.11) for which we intend to make a distinction between P + Q and [P] + [Q] for two points P, . The former sum is again a point on C, whereas the latter is never (the symbol corresponding to) a point on C.

Definition 2.74.

A formal sum , , where nP = 0 except for finitely many , is called a divisor on C. The free Abelian group generated by the symbols [P] for all the points is called the group of divisors of C and is denoted by DivK(C) or simply by Div(C), when K is implicit in the context.

Let be a divisor. The support of D is defined to be the set and is denoted by Supp D.

The degree of D is defined as the integer and is denoted as deg D. The subset of Div(C) is clearly a subgroup of Div(C). We denote this subgroup by Div0(C).

Now we define divisors of rational functions on C. Henceforth we assume that C is smooth (that is, smooth at all K-rational points on C).

Definition 2.75.

The divisor of a rational function is defined to be the formal sum , where ordP(r) is the order of r at P (Definition 2.71). By Theorem 2.43 .

A divisor is called principal, if D = Div(r) for some rational function . We have Div(rr′) = Div(r) + Div(r′) for any rational functions r, . It follows that the set of all principal divisors on C is a subgroup of Div(C) (and of Div0(C) as well). We denote this subgroup by PrinK(C) or simply by Prin(C). The quotient group Div(C)/Prin(C) is called the divisor class group or the Picard group of C and is denoted by PicK(C) or in short by Pic(C). On the other hand, the quotient Div0(C)/Prin(C) is denoted by or Pic0(C) and is called the Jacobian of C. Instead of Pic0(C) we use the notation or .

Though the Jacobian is defined for an arbitrary smooth curve C (defined by an irreducible polynomial), it is a special class of curves called hyperelliptic curves for which it is particularly easy to represent and do arithmetic in the group . This gives us yet another family of groups on which cryptographic protocols can be built.

If K is not algebraically closed, we need not have for a rational function . This means that in that case the group cannot be defined in the above manner. However, since C is also a curve defined over , we can define as above and call a particular subgroup of as the Jacobian of C over K. We defer this discussion until Section 2.12.

Exercise Set 2.10

In this exercise set, we do not assume (unless otherwise stated) that K is necessarily algebraically closed.

2.95
  1. For homogeneous polynomials f1, of respective degrees d1 and d2, prove the following assertions:

    1. If d1 = d2, then f1 ± f2 are homogeneous polynomials of degree d1.

    2. The polynomial f1f2 is homogeneous of degree d1 + d2. Conversely, if f1f2 is homogeneous, then f1 and f2 are also homogeneous.

  2. A polynomial is homogeneous of degree d if and only if it satisfies fX1, . . ., λXn) = λdf(X1, . . ., Xn) for every .

2.96In this exercise, we generalize the notion of homogenization and dehomogenization of polynomials. Let K[X1, . . . , Xn] denote the polynomial ring in n indeterminates. Introducing another indeterminate X0, we define the homogenization of a polynomial as

Prove the following assertions.

  1. f(h) is an element of K[X0, X1, . . . , Xn] and is homogeneous of degree d.

  2. f(h)(1, X1, . . . , Xn) = f(X1, . . . , Xn).

  3. If deg f = d ≥ 0 and fd is the sum of all non-zero terms of degree d in f, then we have f(h)(0, X1, . . . , Xn) = fd(X1, . . . , Xn).

  4. For f, , (fg)(h) = f(h)g(h). Moreover, if g|f, then g(h)|f(h) and (f/g)(h) = f(h)/g(h). Under what condition(s) is (f + g)(h) = f(h) + g(h)?

  5. f is irreducible if and only if f(h) is irreducible.

2.97Let C : f(X, Y) = 0 be an affine plane curve defined by a non-zero polynomial and C : f(h)(X, Y, Z) = 0 the corresponding projective plane curve. Let d := deg f = deg f(h) and fd the sum of non-zero terms of f of degree d. Show that:
  1. f(h)(X, Y, 1) = f(X, Y) and f(h)(X, Y, 0) = fd(X, Y).

  2. is a K-rational point of the affine curve if and only if is a K-rational point of the projective curve. More generally, let . The point is a K-rational solution of f if and only if [x, y, λ] is a K-rational solution of f(h).

  3. The solutions of f at infinity are obtained by solving f(h)(X, Y, 0) = fd(X, Y) = 0. Conclude that the curve C can have at most d points at infinity.

  4. For a, , each of the curves YaX = b and XaY = b (straight lines), and YX2 = 0 and XY2 = 0 (parabolas) contains only one point at infinity. The hyperbola XY – 1 = 0 contains two points at infinity. How many points at infinity does the hyperbola X2Y2 – 1 = 0 contain? The circle X2 + Y2 – 1 = 0?

  5. For a1, a2, a3, a4, , the elliptic curve Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6 contains only one point at infinity.

  6. Let and u(X), with deg ug, deg v = 2g + 1 and v monic. Show that the hyperelliptic curve Y2 + u(X)Y = v(X) has only one point at infinity.

2.98Show that the defining polynomial of the elliptic curve in Exercise 2.97(e) is irreducible. Prove the same for the hyperelliptic curve of Exercise 2.97(f). [H]
2.99Show that for an ideal the following two conditions are equivalent:
  1. is generated by a set of homogeneous polynomials.

  2. If , where fi is the sum of non-zero terms of degree i in f, then for all i = 0, . . . , d. (The polynomials fi are called the homogeneous components of f.)

An ideal satisfying the above equivalent conditions is called a homogeneous ideal. Construct an example to demonstrate that all ideals of K[X1, . . . , Xn] need not be homogeneous.

*2.11. Elliptic Curves

The mathematics of elliptic curves is vast and complicated. A reasonably complete understanding of elliptic curves would require a book of comparable size as this. So we plan to be rather informal while talking about elliptic curves and about their generalizations called hyperelliptic curves. Interested readers can go through the books suggested at the end of this chapter to learn more about these curves. In this section, K stands for a field (finite or infinite) and the algebraic closure of K.

2.11.1. The Weierstrass Equation

An elliptic curve E over K is a plane curve defined by the polynomial equation

Equation 2.6


or by the corresponding homogeneous equation

E : Y2Z + a1XYZ + a3YZ2 = X3 + a2X2Z + a4XZ2 + a6Z3.

These equations are called the Weierstrass equations for E. In order that E qualifies as an elliptic curve, we additionally require that it is smooth at all -rational points (Definition 2.66).[12] Two elliptic curves defined over the field are shown in Figure 2.1.

[12] Ellipses are not elliptic curves.

Figure 2.1. Elliptic curves over

(a) Y2 = X3X + 1
(b) Y2 = X3X


E contains a single point at infinity, namely (Exercise 2.97(e)). The set of K-rational points on E in the projective plane is denoted by E(K) and is the central object of study in the theory of elliptic curves. We shortly endow E(K) with a group structure and this group is used extensively in cryptography.

Let us first see how we can simplify the equation for E. The simplification depends on the characteristic of K. Because fields of characteristic 3 are only rarely used in cryptography, we will not deal with such fields. Simplification of the Weierstrass equation is effected by suitable changes of coordinates. A special kind of transformation is allowed in order to preserve the geometric and algebraic properties of an elliptic curve.

Theorem 2.44.

Two elliptic curves

E1:Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6
E2:Y2 + b1XY + b3Y = X3 + b2X2 + b4X + b6

defined over K are isomorphic (Definition 2.72) if and only if there exist and r, s, such that the substitution of u2X + r for X and u3Y + u2sX + t for Y transforms the equation of E1 to the equation of E2. For this transformation, the coefficients bi are related to the coefficients ai as follows:

Equation 2.7


The theorem is not proved here. Formulas (2.7) can be checked by tedious calculations. A change of variables as in Theorem 2.44 is referred to as an admissible change of variables. We denote this by

(X, Y) ← (u2X + r, u3Y + u2sX + t).

The inverse transformation is also admissible and is given by

Isomorphism is an equivalence relation on the set of all elliptic curves over K.

Consider the elliptic curve E over K given by Equation (2.6). If char K ≠ 2, the admissible change transforms E to the form

E1 : Y2 = X3 + b2X2 + b4X + b6.

If, in addition, char K ≠ 3, the admissible change transforms E1 to E2 : Y2 = X3 + aX + b. We henceforth assume that an elliptic curve over a field of characteristic ≠ 2, 3 is defined by

Equation 2.8


(instead of by the original Weierstrass Equation (2.6)).

If char K = 2, the Weierstrass equation cannot be simplified as in Equation (2.8). In this case, we consider two cases separately, namely a1 ≠ 0 or otherwise. In the former case, the admissible change allows us to write Equation (2.6) in the simplified form

Equation 2.9


On the other hand, if a1 = 0, then the admissible change (X, Y) ← (X + a2, Y) shows that E can be written in the form

Equation 2.10


A curve defined by Equation (2.9) is called non-supersingular, whereas one defined by Equation (2.10) is called supersingular.

Now we associate two quantities with an elliptic curve. The importance of these quantities follows from the subsequent theorem. We start with the generic Weierstrass equation and later specialize to the simplified formulas.

Definition 2.76.

For the curve given by Equation (2.6), we define the following quantities:

Equation 2.11


Δ(E) is called the discriminant of the curve E, and j(E) the j-invariant of E.

For the special cases given by the simplified equations above, these quantities have more compact formulas as given in Table 2.5.

Theorem 2.45.

For the curve E defined by Equation (2.6), the following properties hold:

  1. An admissible change of variables does not alter Δ(E) and j(E).

    Table 2.5. Discriminant and j-invariant for elliptic curves
    Special caseΔ(E)j(E)
    char K ≠ 2, 3 (Equation 2.8)–16(4a3 + 27b2)1728(4a)3/Δ(E)
    char K = 2, non-supersingular (Equation 2.9)b1/b
    char K = 2, supersingular (Equation 2.10)a40

  2. E is an elliptic curve, that is, E is smooth, if and only if Δ(E) ≠ 0. In particular, the j-invariant is defined for all elliptic curves.

  3. Let E1 and E2 be two elliptic curves defined over the field K. If E1 and E2 are isomorphic over K, then j(E1) = j(E2). Conversely, if j(E1) = j(E2), then E1 and E2 are isomorphic over .

Proof

  1. Tedious calculations using Formulas (2.7) establish this claim.

  2. The polynomial f(X, Y, Z) = Y2Z + a1XYZ + a3YZ2X3a2X2Za4XZ2a6Z3 defines the curve E. Since , E is smooth at . Suppose that E is not smooth at the finite point . The admissible change (X, Y) ← (X + x0, Y + y0) does not alter the value of Δ(E) by (1). So we can assume, without loss of generality, that (x0, y0) = (0, 0). But then we have f(0, 0) = –a6 = 0, ∂f/∂x(0, 0) = –a4 = 0 and ∂f/∂y(0, 0) = a3 = 0. Now it is easy to check from Equation (2.11) that Δ(E) = 0.

    Conversely, let Δ(E) = 0. For simplicity, we assume that char K ≠ 2, 3 and E is given by Equation (2.8). By Exercise 2.62, , that is, the polynomial X3 + aX + b has multiple roots, say, . But then E is not smooth at .

  3. By Part (1) and Theorem 2.44, two isomorphic elliptic curves have the same j-invariant. For proving the converse, we once again assume that char K ≠ 2, 3 and E1 : Y2 = X3 + a1X + b1 and E2 : Y2 = X3 + a2X + b2 have the same j-invariant. Then we have . Now we provide an admissible change of variable of the form (X, Y) ← (u2X, u3Y), , that transforms E1 to E2. Since Δ(E1) ≠ 0 and Δ(E2) ≠ 0, we take u = (b1/b2)1/6 if a1 = 0, u = (a1/a2)1/4 if b1 = 0, and u = (a1/a2)1/4 = (b1/b2)1/6 if a1b1 ≠ 0. Note that since is algebraically closed, u is defined in all the above cases.

2.11.2. The Elliptic Curve Group

Consider an elliptic curve E over a field K. We now define an operation (which is conventionally denoted by +) on the set E(K) of K-rational points on E in the projective plane . This operation provides a group structure on E(K). It is important to point out that this group is not the same as the group DivK(E) of divisors on E(K) (Definition 2.74), since the sum of points we are going to define is not formal. However, there is a connection between these two groups (See Exercise 2.125).

Definition 2.77.

Let E be the elliptic curve defined by Equation (2.6) and the point at infinity on E. A binary operation + on E(K) is defined as follows:

  1. For any , we define , that is, serves as the additive identity.

  2. The opposite (additive inverse) of a point is now defined: if , then –P = P, and if , then –P = (h, –ka1ha3).

  3. For P, , the sum P + Q is defined by the chord and tangent rule which goes as follows.

    1. If Q = –P, then .

    2. If Q ≠ –P, we consider the line passing through P and Q (we take the tangent line if P = Q). Since the degree of the defining equation for E is three, this line meets the curve at exactly one other point R. We define P + Q = –R. Figure 2.1 illustrates this case for curves over .

Theorem 2.46.

The set E(K) under the operation + is an Abelian group.

No simple proof of this theorem is known. Indeed the only group axiom that is difficult to check is associativity, that is, to check that (P + Q) + R = P + (Q + R) for all P, Q, . An elementary strategy would be to write explicit formulas for (P + Q) + R and P + (Q + R) (using the formulas for P + Q given below) and show that they are equal, but this process involves a lot of awful calculations and consideration of many cases.

There are other proofs that are more elegant, but not as elementary. One possibility is to use the theory of divisors and is outlined now. It turns out that the Jacobian has a bijective correspondence with the set E(K) via the map which takes to (more correctly to the equivalence class of the divisor in ). Furthermore, , where the addition on the left is the addition on E(K) as defined above and the addition on the right is that in the Jacobian . By definition, is naturally an additive Abelian group. It immediately follows that E(K) is an additive Abelian group too. (See Exercise 2.125.)

We now give the formulas for the coordinates of the points –P and P + Q on E(K). The derivation of these formulas for the general case is left to the reader (Exercise 2.102). We concentrate on the important special cases. We assume that P = (h1, k1) and Q = (h2, k2) are finite points on E(K) with Q ≠ –P so that .

If char K ≠ 2, 3 and E is defined by Equation 2.8, we have:

Next, we consider char K = 2 and non-supersingular curves (Equation 2.9). The formulas in this case are:

Finally, for supersingular curves (Equation 2.10) with char K = 2, we have:

We denote by mP the sum P + · · · + P (m times) for a point and for . We also define and (–m)P := –(mP) (for ).

Example 2.22.
  1. Consider the elliptic curve

    E1 : Y2 = X3 + X + 3

    over . We have Δ(E1) ≡ –16(4 × 13 + 27 × 32) ≡ 3 (mod 7). Also j(E1) ≡ 1728 × 43 × 3–1 ≡ 2 (mod 7), that is, j(E1) = 2. It is easy to check that contains the six points , P1 = (4, 1), P2 = (4, 6), P3 = (5, 0), P4 = (6, 1) and P5 = (6, 6). The multiples of these points are summarized in Table 2.6. It follows that the group is cyclic with P1 as a generator.

    Table 2.6. Multiples of points on the elliptic curve Y2 = X3 + X + 3 over
    P2P3P4P5P6Pord P
         1
    P1 = (4, 1)(6, 6)(5, 0)(6, 1)(4, 6)6
    P2 = (4, 6)(6, 1)(5, 0)(6, 6)(4, 1)6
    P3 = (5, 0)    2
    P4 = (6, 1)(6, 6)   3
    P5 = (6, 6)(6, 1)   3

  2. Now, consider the non-supersingular elliptic curve

    E2 : Y2 + XY = X3 + X2 + ξ

    defined over , where ξ := T + 〈T3 + T + 1〉. We have Δ(E2) = ξ and j(E2) = ξ–1 = ξ2 + 1. The finite points on E2 are:

    P1=(0, ξ2 + ξ),
    P2=(1, ξ2),
    P3=(1, ξ2 + 1),
    P4=(ξ, ξ2),
    P5=(ξ, ξ2 + ξ),
    P6=(ξ + 1, ξ2 + 1),
    P7=(ξ + 1, ξ2 + ξ),
    P8=(ξ2 + ξ, 1),
    P9=(ξ2 + ξ, ξ2 + ξ + 1).

    So contains 10 points (including ). The multiples of the points are listed in Table 2.7, which implies that is again cyclic.[13] The φ(10) = 4 generators of this group are P4, P5, P8 and P9.

    [13] Both 6 and 10 are square-free integers, and so the groups and must be cyclic (Exercise 2.115(a)).

    Table 2.7. Multiples of points on the elliptic curve Y2 + XY = X3 + X2 + ξ over .
    P2P3P4P5P6P7P8P9P10Pord P
    P0         1
    P1        2
    P2P7P6P3     5
    P3P6P7P2     5
    P4P3P9P6P1P7P8P2P510
    P5P2P8P7P1P6P9P3P410
    P6P2P3P7     5
    P7P3P2P6     5
    P8P6P4P2P1P3P5P7P910
    P9P7P5P3P1P2P4P6P810

  3. Let us continue to represent as in (2). The supersingular curve

    E3 : Y2 + Y = X3 + ξX + ξ2

    has Δ(E3) = 1, j(E3) = 0. is a cyclic group with 9 points as Table 2.8 illustrates.

Table 2.8. Multiples of points on the elliptic curve Y2 + Y = X3 + ξX + ξ2 over
P2P3P4P5P6P7P8P9Pord P
P0 =        1
P1 = (0, ξ2 + ξ)P5P4P7P8P3P6P29
P2 = (0, ξ2 + ξ + 1)P6P3P8P7P4P5P19
P3 = (ξ + 1, ξ)P4      3
P4 = (ξ + 1, ξ + 1)P3      3
P5 = (ξ2, ξ2)P7P3P2P1P4P8P69
P6 = (ξ2, ξ2 + 1)P8P4P1P2P3P7P59
P7 = (ξ2 + ξ, ξ2 + ξ)P2P4P6P5P3P1P89
P8 = (ξ2 + ξ, ξ2 + ξ +1)P1P3P5P6P4P2P79

Definition 2.78.

Let . The set of points such that is evidently a subgroup of E(K) and is denoted by EK[m] or by E[m], if K is understood from the context. The elements of EK[m], called the m-torsion points of E, are those points of E(K), the (additive) orders of which are finite and divide m.

Multiples mP of a point can be expressed using nice formulas.

Definition 2.79.

For an elliptic curve defined over K by the equation E : f(X, Y) = 0 and for , there exist polynomials θm, ωm, , such that for any point with we have

mP = (θm(h, k)/ψm(h, k)2, ωm(h, k)/ψm(h, k)3).

The polynomial ψm is called the m-th division polynomial of E.

Using the addition formula one can verify the following recursive description for ψm and the expressions for θm and ωm in terms of ψm.

Lemma 2.8.

For an elliptic curve E defined by the general Weierstrass Equation (2.6) over a field K, the division polynomials ψm, , are recursively described as:

where di are as in Definition 2.76. The polynomials θm satisfy

for all ,

and for char K ≠ 2, one has

It follows by induction on m that these formulas really give polynomial expressions for ψm, θm and ωm for all . For even m, the polynomial ψm is divisible by ψ2. Furthermore, for the polynomials defined as

can be expressed as polynomials in x only. These univariate polynomials are easier to handle than the bivariate ones ψm and, by an abuse of notation, are also called division polynomials. The degrees of satisfy the inequality:

Points of E[m] can be characterized in terms of the division polynomials:

Theorem 2.47.

Ler and . Then if and only if ψm(h, k) = 0. Furthermore, if m > 2 and , then if and only if .

We finally define polynomials as follows. If char K ≠ 2, then for all . On the other hand, for char K = 2 and for non-supersingular curves over K we already have (Exercise 2.107), and it is customary to define fm(x) := ψm(x, y) for all . By further abuse of notations, we also call fm the m-th division polynomial of E.

2.11.3. Elliptic Curves over Finite Fields

In this section, we take , a finite field of cardinality q and characteristic p. We do not deal with the case p = 3. Let E be an elliptic curve defined over . If p > 3, we assume that E is defined by Equation (2.8), whereas for p = 2, we assume that E is defined by Equation (2.10) or Equation (2.9) depending on whether E is supersingular or not.

Since is a subset of , the cardinality is finite. The next theorem shows that is quite close to q.

Theorem 2.48. Hasse’s theorem

, where . (The integer t is called the trace of Frobenius at q.)

The implication of this theorem is that the possible cardinalities of lie in a rather narrow interval . If q = p is a prime, then for every n, , there is at least one curve E with . Moreover, the values of are distributed almost uniformly in the interval . However, if q is not a prime, these nice results do not continue to hold.

Definition 2.80.

If t = 1 (that is, if ), the curve E is called anomalous. If p|t, the curve E is called supersingular and if pt, then E is called non-supersingular.

Anomalous and supersingular curves are cryptographically weak, because certain algorithms are known with running time better than exponential to solve the so-called elliptic curve discrete logarithm problem over these curves. Determination of the order gives t from which one can easily check whether E is anomalous or supersingular. If p = 2, we have an easier check for supersingularity.

Proposition 2.35.

An elliptic curve E over a finite field of characteristic 2 is supersingular if and only if j(E) = 0 or, equivalently, if and only if a1 = 0 in Equation (2.6).

For arbitrary characteristic p, we have the following characterization.

Proposition 2.36.

An elliptic curve E over is supersingular if and only if t2 = 0, q, 2q, 3q or 4q. In particular, if char , 3, then E is supersingular if and only if t = 0.

By Theorem 2.38, the group is always cyclic. However, the group is not always cyclic, but is of a special kind. We need a few definitions to explain the structure of . The notion of internal direct product for multiplicative groups (Exercise 2.19) can be readily applied to additive groups as follows.

Definition 2.81.

Let G be an additive group and let H1, . . . , Hr be subgroups of G. If every element of G can be written uniquely as h1 + · · · + hr with , i = 1, . . . , r, we say that G is the (internal) direct sum of the subgroups H1, . . . , Hr and denote this as G = H1 ⊕ · · · ⊕ Hr.

Theorem 2.49. Structure theorem for finite Abelian groups

Let G be a finite additive Abelian group of cardinality #G = n. Then there exist and integers ni ≥ 2 for 1 ≤ ir, such that G is the direct sum of (subgroups isomorphic to the) cyclic groups , that is, , where ni+1|ni for all i = 1, . . . , r – 1. Furthermore, such a decomposition is unique in the sense that if with integers mi ≥ 2 and mi+1|mi for i = 1, . . . , s – 1, then r = s and ni = mi for all i = 1, . . . , r. In this case, we say that G has rank r and is of type (n1, . . . , nr). By Lagrange’s theorem, each ni|n. Moreover, n = n1n2 · · · nr. G is cyclic if and only if the rank of G is 1.

Theorem 2.50. Structure theorem for

The elliptic curve group is of rank 1 or 2. If the rank is 1, then is cyclic, otherwise , where n1, n2 ≥ 2 and n2|n1. In the second case, we have n2|(q – 1).

Once we know the order of the group , it is easy to compute the order of as the following theorem suggests.

Theorem 2.51.

Let α, satisfy 1 – tX + qX2 = (1 – αX)(1 – βX). Then for any the order .

Exercise Set 2.11

2.100Show that the following curves over K are not smooth (and hence not elliptic curves):
  1. Y2 = X3, K arbitrary.

  2. Y2 = X3 + X2, K arbitrary.

  3. Y2 = X3 + aX + b, if char K = 2.

2.101
  1. Show that for an elliptic curve E over K and a finite point , the only points in E(K) (or ) having X-coordinate equal to h are P and –P.

  2. Let char K ≠ 2, 3 and let E be defined by Equation (2.8). If α1, α2, are the roots (distinct by Theorem 2.45) of X3 + aX + b, then (α1, 0), (α2, 0) and (α3, 0) are the only points on with Y-coordinate equal to 0. Show that these are the only points of order 2 in .

2.102Let P = (h1, k1) and Q = (h2, k2) be two points (different from ) in E(K) defined by the Weierstrass Equation (2.6). Assume that Q ≠ –P. Determine R = (h3, k3) = P + Q as follows:
  1. Show that the line passing through P and Q (the tangent, if P = Q) has the equation Y = λX + μ, where

  2. Substituting λX + μ for Y in Equation (2.6) gives a cubic equation in X of which h1 and h2 are two roots. Show that the third root (the X-coordinate of R) is

    h3 = λ2 + a1λ – a2h1h2.

    Hence deduce that the Y-coordinate of R is

    k3 = –(λ + a1)h3 – μ – a3.

2.103Let . Show that there exists an elliptic curve E over K such that . [H]
2.104Assume that char K ≠ 2, 3 and consider the elliptic curve E given by Equation (2.8). Let K[E] be the affine coordinate ring and K(E) the field of rational functions on E.
  1. Show that every element in K[E] can be uniquely represented as u(x) + yv(x) for polynomials u(x), .

  2. The conjugate of is defined as . The norm of f is defined as . Show that .

  3. The degree of is defined as deg f := max(2 degx u, 3 + 2 degx v), where degx denotes the degree in x. Show that deg f = degx N(f).

  4. Show that for f, one has N(fg) = N(f) N(g). Hence conclude that deg(fg) = deg f + deg g.

  5. Show that every rational function in K(E) can be represented as a(x) + yb(x), where a(x), .

2.105Show that the division polynomials for the general Weierstrass equation can be recursively defined as

where F = 4x3 + d2x2 + 2d4x + d6.

2.106Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.8 over a field K of characteristic ≠ 2, 3. Show that for m ≥ 2 and for we have

2.107Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.9 over a field K of characteristic 2. Conclude that ψm are polynomials in only x for all . With fm := ψm for all show that for m ≥ 2 and for we have

2.108Consider the elliptic curve defined over the field :

Ea,b : Y2 = X3 + aX + b.

Verify the following assertions: (You may write a computer program.)

  1. Each Ea,b has order between 3 and 13.

  2. The curve E0,3 : Y2 = X3 + 3 has the maximum possible order 13.

  3. The curve E0,4 : Y2 = X3 + 4 has the minimum possible order 3.

  4. The curve E0,5 : Y2 = X3 + 5 is anomalous.

  5. The group is not cyclic.

2.109Consider the representation of as , where ξ is a root of T3 + T + 1 in . Identify an element (where ) with the integer (a2a1a0)2 = a222 + a12 + a0. For integers a, , b ≠ 0, define the non-supersingular elliptic curve:

Ea,b : Y2 + XY = X3 + aX2 + b.

Verify the following assertions: (You may write a computer program.)

  1. Each Ea,b has order between 4 and 14.

  2. The curve E1,1 : Y2 + XY = X3 + X2 + 1 has the maximum possible order 14.

  3. The curve E2,1 : Y2 + XY = X3 + ξX2 + 1 has the minimum possible order 4.

  4. The curve E2,2 : Y2 + XY = X3 + ξX2 + ξ is anomalous.

  5. The orders of Ea,b for all choices of a, b lie in the set {4, 6, 8, 10, 12, 14}.

  6. Each is cyclic.

  7. Theorem 2.45(3) requires the phrase over , that is, two curves over an algebraically non-closed field having the same j-invariant may be non-isomorphic.

2.110Consider the representation of and the identification of elements of with integers as in Exercise 2.109. For a, b, , a ≠ 0, define the supersingular elliptic curve:

Ea,b,c : Y2 + aY = X3 + bX + c.

Verify the following assertions: (You may write a computer program.)

  1. Each Ea,b,c has order between 5 and 13.

  2. The curve E1,1,1 : Y2 + Y = X3 + X + 1 has the maximum possible order 13.

  3. The curve E1,1,2 : Y2 + Y = X3 + X + ξ has the minimum possible order 5.

  4. The orders of Ea,b,c for all choices of a, b, c lie in the set {5, 9, 13}.

  5. No Ea,b,c is anomalous.

  6. Each is cyclic.

2.111Consider the elliptic curve E : Y2 + XY = X3 + X2 + 1 defined over for all . Show that

where r = ⌊n/2⌋. [H] Conclude that E is anomalous over , but not so over .

2.112Let K be a finite field of characteristic ≠ 2, 3 and E : Y2 = X3 + aX + b an elliptic curve defined over K. Prove that:
  1. #E(K) is odd if and only if X3 + aX + b is irreducible in K[X]. [H]

  2. E(K) is not cyclic if X3 + aX + b splits in K[X].

  3. The converse of Part (b) does not hold. [H]

2.113Let E : Y2 + XY = X3 + aX2 + b be a non-supersingular elliptic curve defined over . Prove that:
  1. has exactly one point of order 2. [H]

  2. is even.

2.114Let E : Y2 + aY = X3 + bX + c be a supersingular elliptic curve over . Prove that:
  1. has no points of order 2.

  2. is odd.

2.115
  1. Let G be a finite Abelian group of cardinality n. Show that if n is square-free, then G is cyclic. [H]

  2. Prove that if E is an anomalous elliptic curve over , then is cyclic. [H]

  3. If E is a supersingular elliptic curve over the field of characteristic ≠ 2, 3, prove that is either cyclic or isomorphic to . [H]

2.116Let , p ≡ 3 (mod 4), and a, . Consider the elliptic curve E : Y2 = X3a2X over (or over ). Prove that:
  1. contains at most three points of order three.

  2. The points of order three in are precisely the points of order three in .

2.117A Weierstrass equation of an elliptic curve defined over a field K is said to be in the Legendre form, if it can be written as

Equation 2.12


for some , k ≠ 0, 1. Show that if char K ≠ 2, then every Weierstrass equation over K can be written in the Legendre form. Show that the j-invariant of the curve E defined by Equation (2.12) is .

**2.12. Hyperelliptic Curves

Hyperelliptic curves are generalizations of elliptic curves. We cannot define a group structure on a general hyperelliptic curve in the way as we did for elliptic curves. We instead work in the Jacobian of a hyperelliptic curve. For an elliptic curve E over an algebraically closed field K, the Jacobian is canonically isomorphic to the group E(K). Thus one can as well use the techniques for hyperelliptic curves for describing and working in elliptic curve groups. However, the exposition of the previous section turns out to be more intuitive and computationally oriented.

2.12.1. The Defining Equations

A hyperelliptic curve C of genus over a field K is defined by a polynomial equation of the form

Equation 2.13


In order that C qualifies as a hyperelliptic curve, we additionally require that C (as a projective curve) be smooth over . The set of K-rational points on C is denoted as usual by C(K). For g = 1, Equation (2.13) is the same as the Weierstrass Equation (2.6) on p 98, that is, elliptic curves are hyperelliptic curves of genus one. A hyperelliptic curve of genus 2 over is shown in Figure 2.2.

Figure 2.2. A hyperelliptic curve of genus 2 over : Y2 = X(X2 – 1)(X2 – 2)


A hyperelliptic curve has only one point at infinity (Exercise 2.97(f)) and is smooth at . If char K ≠ 2, substituting simplifies Equation (2.13) as . Since is a monic polynomial in K[X] of degree 2g + 1, we may assume that if char K ≠ 2, the equation for C is of the form:

Equation 2.14


Proposition 2.37.

If char K ≠ 2, then the hyperelliptic curve C defined by Equation (2.14) is smooth if and only if v has no multiple roots (in ). If char K = 2, then the curve defined by Equation (2.14) is never smooth.

Proof

First, consider char K ≠ 2. If v has a multiple root, say , then v′(α) = 0 and, therefore, C is not smooth at the finite point . Conversely, if (h, k) is a singular point on , then we have 2k = 0 and v′(h) = 0. Since (h, k) = (h, 0) is a point on C, we have v(h) = 0, that is, h is a multiple root of v.

For char K = 2 and , we have (∂(Y2v(X))/∂X)(h, k) = v′(h) and (∂(Y2v(X))/∂Y)(h, k) = 0. Now, v′(X) is a monic polynomial of degree 2g > 0 and, therefore, has at least one root, say . But then C is not smooth at .

Definition 2.82.

Let P = (h, k) be a finite point on the hyperelliptic curve C defined by Equation (2.13). The point is called the opposite of P.[14] P and are the only points on C with X-coordinate equal to h. If , then P is called a special point on C, otherwise it is called an ordinary point on C. The set of all finite (resp. ordinary, resp. special) points on C is denoted by Cfin(K) (resp. Cord(K), resp. Cspl(K)). These notations are also abbreviated as Cfin, Cord and Cspl, if the field K is understood from the context.

[14] It is customary to define the opposite of to be itself.

2.12.2. Polynomial and Rational Functions

All the general theory we described in Section 2.10 continues to be valid for hyperelliptic curves. However, since we are now given an explicit equation describing the curves, we can give more explicit expressions for polynomial and rational functions on hyperelliptic curves. For simplicity, we consider the affine equation and extend our definitions separately for the point at infinity.

Consider the hyperelliptic curve C defined by Equation (2.13). By Exercise 2.98, the defining polynomial f(X, Y) := Y2 + u(X)Yv(X) (or its homogenization) is irreducible over , so that the affine (or projective) coordinate ring of C is an integral domain and the corresponding function field is simply the field of fractions of the coordinate ring.

Let . Since y2 + u(x)yv(x) = 0 in K[C], we can repeatedly substitute y2 by –u(x)y + v(x) in G(x, y) until the y-degree of G(x, y) becomes less than 2. This proves part of the following:

Proposition 2.38.

Every polynomial function can be written uniquely as G(x, y) = a(x) + yb(x) for some a(X), .

Proof

In order to establish the uniqueness, note that if G(x, y) = a1(x) + yb1(x) = a2(x) + yb2(x), then . Since the Y -degree of f is 2, this implies [a1(X) + Y b1(X)] – [a2(X) + Y b2(X)] = 0, that is, [a1(X) – a2(X)] + [b1(X) – b2(X)]Y = 0, that is, a1(X) = a2(X) and b1(X) = b2(X).

Definition 2.83.

Let . The conjugate of G is defined to be the polynomial function . The norm of G is defined as .

Some useful properties of the norm function are listed in the following lemma, the proof of which is left to the reader as an easy exercise.

Lemma 2.9.

For G, , we have:

  1. .

  2. If G(x, y) = a(x) + yb(x), then N(G) = a(x)2a(x)b(x)u(x) – v(x)b(x)2. In particular, .

  3. .

  4. N(GH) = N(G) N(H).

We also have an easy description of the rational functions on C.

Proposition 2.39.

Every rational function can be written in the form s(x) + yt(x) for some s(X), .

Proof

We can write r(x, y) = G(x, y)/H(x, y) for G, , H ≠ 0. Multiplying both the numerator and the denominator by and using Lemma 2.9(2) and Proposition 2.38 completes the proof.

The value of a rational function on C at a finite point on C can be defined as in the case of general curves (See Definition 2.68). In order to define the value of a rational function at the point , we need some other concepts.

For a moment, let us assume that . From the equation of C, we see that k2h2g+1 (neglecting lower-degree terms) for sufficiently large coordinates h, k of a point . This means that k tends to infinity exponentially (2g + 1)/2 times as fast as h does. So it is customary to give Y a weight (2g + 1)/2 times a weight we give to X. The smallest integral weights of X and Y to satisfy this are 2 and 2g + 1 respectively. This motivates us to provide Definition 2.84 (generalized for any K).

Definition 2.84.

Let . The degree of G is defined to be deg G := max(2 degx a, 2g + 1 + 2 degx b), where degx denotes the usual x-degree of a polynomial in K[x]. Since a and b are uniquely determined by G, deg G is well-defined. If G = 0, we set deg G := –∞.

If 0 ≠ G = a(x)+yb(x), d1 = degx a and d2 = degx b, then the leading coefficient of G is taken to be the coefficient of xd1 in a(x) if deg G = 2d1, or to be the coefficient of xd2 in b(x) if deg G = 2g + 1 + 2d2. (We cannot have 2d1 = 2g + 1 + 2d2, since the left side is even and the right side is odd.)

Some basic properties of the degree function follow.

Lemma 2.10.

For G, , we have:

  1. deg G = degx(N(G)).

  2. deg(GH) = deg G + deg H.

  3. .

Proof

Easy exercise.

Now we are in a position to give an explicit definition of the value of a rational function at .

Definition 2.85.

For with G, , we define as:

If deg(G) < deg(H), then .

If deg(G) > deg(H), then (that is, r is not defined at ).

If deg(G) = deg(H), then is defined as the ratio of the leading coefficients of G and H.

Now that we have a complete description of the value of a rational function at any point on C, poles and zeros of rational functions on C can be defined as in Definition 2.70. In order to define the order of a polynomial or rational function at a point P on C, we should find a uniformizing parameter uP at P. Tedious calculations help one deduce the following explicit expressions for uP.

Proposition 2.40.

Let be a finite point. Then we can take

as a uniformizing parameter at P. Finally, is a uniformizing parameter at the point at infinity (where g is the genus of C).

We give an alternative definition of the order (independent of uP), which is computationally useful and which is equivalent to Definition 2.71 for a hyperelliptic curve.

Definition 2.86.

Let and . The order of G at P is defined as follows. First, let P = (h, k) be a finite point on C. Let e be the largest exponent such that (xh)e divides both a(x) and b(x). We write G = (xh)eG1(x, y). If G1(h, k) ≠ 0 we set l := 0, otherwise we set l to be the highest exponent such that (xh)l divides N(G1). We then define

Finally, we define .

Now, let r(x, y) = G(x, y)/H(x, y) be a rational function on C and . We define the order of r at P as ordP(r) := ordP(G) – ordP(H). The value ordP(r) can be shown to be independent of the choice of G and H.

Example 2.23.

Let be a finite point on C. Consider the rational function , . The only points on C with X-coordinate equal to h are P and its opposite . Therefore, if P is an ordinary point, , whereas if P is a special point, ordP (r) = 2m. Moreover, . For any , we have ordQ(r) = 0.

Now consider r = (xh)m for some m < 0. Write r = G/H with G = 1 and H = (xh)m. Since ordQ(r) = ordQ(G) – ordQ(h), we continue to have

If m ≥ 0, then r is a polynomial function and has zeros P and and no finite poles. In this case, the sum of the orders of its zeros is 2m = 2 degx r = deg r. Theorem 2.52 generalizes this observation.

Theorem 2.52.

A non-constant polynomial function has only finitely many zeros and a single pole at . Furthermore, if K is algebraically closed, then .

2.12.3. The Jacobian

We continue to work with the hyperelliptic curve C of Equation (2.13). We first impose the restriction that K is algebraically closed and use the theory of Section 2.10 to define the set Div(C) of divisors on C, the degree zero part Div0(C) of Div(C), the divisor Div(r) of a rational function , the set Prin(C) of principal divisors on C, the Picard group Pic(C) = Div(C)/ Prin(C) and the Jacobian .

Example 2.24.

For the rational function r := (xh)m of Example 2.23, we have:

The Jacobian is the set of all cosets of Prin(C) in Div0(C). It is not a good idea to work with cosets (which are equivalence classes). Recall that in the case of , we represented a coset by the remainder of Euclidean division of a by n. In case of the representation , we took polynomials of smallest degrees as canonical representatives of the cosets of 〈f(X)〉. In case of too, we intend to find such good representatives, one from each coset. We now introduce the concept of reduced divisors for that purpose.

Definition 2.87.

Two divisors D1, (resp. in Div(C)) are said to be equivalent, denoted D1 ~ D2, if , or equivalently if .

Our goal is to associate to every divisor some unique reduced divisor with D ~ Dred, that is, Dred plays the role of the canonical representative of . We start with the following definition.

Definition 2.88.

A divisor is called semi-reduced, if each mP ≥ 0 and if for mP > 0 we have: if P is an ordinary point, and mP = 1 if P is a special point.

Proposition 2.41.

Every divisor is equivalent to some semi-reduced divisor D1.

Proof

Let , with and with Cord being the disjoint union of C1 and C2, where an ordinary point if and only if its opposite and . Now we can write D = D1 + D2, where

and

with m1 and m2 so chosen that D1, . By definition, D1 is semi-reduced, whereas by Example 2.24 , where

Now, we explain how we can represent a semi-reduced divisor by a pair of polynomials a(x), . For that, we need a definition.

Definition 2.89.

Let and be two divisors on C (not necessarily in Div0(P)). The greatest common divisor (gcd) of D1 and D2 is defined as the divisor

Theorem 2.53.

Let be a semi-reduced divisor on C. Let Pi = (hi, ki), i = 1, . . . , n, be the only finite points P on C such that mP > 0. Let mi := mPi, and (so that degx(a) = m). Then there exists a unique polynomial with the following properties:

  1. degx b < m,

  2. b(hi) = ki for i = 1, . . . , n,

  3. a(x) divides b(x)2 + b(x)u(x) – v(x), and

  4. .

Conversely, if a(x), with degx b < degx a and with a dividing b2 + buv, then the divisor gcd is semi-reduced.

We denote the divisor gcd by Div(a, b). The zero divisor has the representation Div(1, 0).

A representation of the elements of by semi-reduced divisors (that is, by pairs of polynomials in K[x]) suffers from two disadvantages. First, the representation is not unique, and second, the degrees of the representing polynomials may be quite large. These difficulties are removed if we consider semi-reduced divisors of a special kind.

Definition 2.90.

A semi-reduced divisor is called a reduced divisior, if , where g is the genus of C.

The following theorem establishes the desirable properties of a reduced divisor.

Theorem 2.54.

For , there exists a unique reduced divisor D1 equivalent to D.

Proof

We only prove the existence of reduced divisors. For the proof of the uniqueness, one may, for example, see Koblitz [154]. The norm of a divisor is defined as the integer .

Let . By Proposition 2.41 there exists a semi-reduced divisor D′ ~ D. One can easily verify that |D′| ≤ |D|. If we already have |D′| ≤ g, then D′ is a desired reduced divisor. So assume otherwise, that is, |D′| ≥ g + 1. We can then choose finite points P1, . . . , Pg+1 on C (not necessarily all distinct) such that is a subsum of the formal sum D′. Let the semi-reduced divisor be represented as Div(a, b) with degx a = g + 1 and degx bg. But then deg(b(x) – y) = 2g + 1 and b(x) – y has zeros at P1, . . . , Pg+1 by Theorem 2.53. So by Theorem 2.52 we can write for some finite points Q1, . . . , Qg on C. Now satisfies D″ ~ D′ and |D″| < |D′|. We apply Proposition 2.41 again to get a semi-reduced divisor D‴ ~ D″ with |D‴| ≤ |D″|. Thus starting from the semi-reduced divisor D′ we produce another semi-reduced divisor D‴ such that D‴ ~ D′ ~ D and |D‴| < |D′|. We continue the process a finite number of times, until we get an equivalent semi-reduced divisor D1 of norm ≤ g. This is a desired reduced divisor.

From the viewpoint of cryptography, the field K should be a finite field which is never algebraically closed. So we must remove the restriction . Since C is naturally defined over as well, we start with the Jacobian and define a particular subgroup of to be the Jacobian of C over K.

Definition 2.91.

Let be a K-automorphism of . For a point , the point is also in . For a divisor , we define . D is said to be defined over K if for all . The subset of consisting of divisor classes that have representative divisors defined over K is a subgroup (denoted by ) of and is called the Jacobian of C over K.

Every element of can be represented uniquely as a reduced divisor Div(a, b) for polynomials a(x), with degx ag and degx b < degx a. is, therefore, a finite Abelian group. For suitably chosen hyperelliptic curves, these groups can be used to build cryptographic protocols.

Exercise Set 2.12

In this exercise set, we let C denote a hyperelliptic curve of genus g defined by Equation (2.13) over a field K (not necessarily algebraically closed).

2.118
  1. Show that the curve

    C1 : Y2 = X5 + X + 1

    defined over is not smooth and so not a hyperelliptic curve. Find a point where C1 is not smooth.

  2. Show that the curve

    C2 : Y2 = X5 + X + 2

    defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C2. (There are ten of them.)

2.119Represent as , where ξ is a root of the irreducible polynomial .
  1. Show that the curve

    C3 : Y2 + XY = X5 + X + 1

    defined over is not smooth and so not a hyperelliptic curve. Find a point where C3 is not smooth.

  2. Show that the curve

    C4 : Y2 + XY = X5 + X + ξ

    defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C4. (There are eight of them.)

2.120Let . Prove the following assertions:
  1. The only points on C with X-coordinate equal to h are P and .

  2. .

  3. P is a special point if and only if u2(h) + 4v(h) = 0.

  4. If char K ≠ 2, then C has at most 2g + 1 special points, whereas if char K = 2, then C has at most g special points.

2.121Prove Lemmas 2.9 and 2.10.
2.122Let and .
  1. Show that G(P) = 0 if and only if .

  2. Let . Show that either P is a special point of C or h is a common root of u and v.

  3. Show that and that .

2.123Prove Theorem 2.52. [H]
2.124A line on C is a polynomial function of the form with a, b, , a and b not both 0.
  1. Let D = Div(l) be the divisor of a line l. Show that the norm |D| is either 2 or 2g + 1.

  2. Let . Determine Div(xh).

  3. Determine Div(y).

2.125Let E be an elliptic curve (that is, a hyperelliptic curve of genus 1) defined over K.
  1. Show that any divisor can be written as for some unique point and for some rational function . This rational function r is unique up to multiplication by elements of .

  2. Show that the map that maps the residue class of to the point satisfying for some , is a bijection.

  3. Let P, , not both . Show that there is a line l with , where R = –(P + Q).

  4. Let , where σ is defined in Part (b). Show that for P, one has . (This, in particular, proves Theorem 2.46 and that σ is a group isomorphism.)

  5. Let . Show that D is a principal divisor if and only if (integer sum) and (sum in ).

**2.13. Number Fields

In this section, we develop the theory of number fields and rings. Our aim is to make accessible to the readers the working of the cryptanalytic algorithms based on number field sieves.

2.13.1. Some Commutative Algebra

Commutative algebra is the study of commutative rings with identity (rings by our definition). Modern number theory and geometry are based on results from this area of mathematics. Here we give a brief sketch of some commutative algebra tools that we need for developing the theory of number fields.

Ideal arithmetic

We start with some basic operations on ideals (cf. Example 2.7, Definition 2.23).

Definition 2.92.

Let A be a ring and let , , be a family (not necessarily finite) of ideals in A.

The set-theoretic intersection is evidently an ideal in A.

The sum of the family is the ideal

Two ideals and of A are said to be relatively prime or coprime, if , or equivalently if there exist and with a + b = 1.

If I = {1, 2, . . . , n} is finite, the product is the ideal generated by all elements of the form x1x2 . . . xn with for all i = 1, . . . , n. We have:

If , the product is denoted as . The empty product of ideals is conventionally taken to be the unit ideal A. If is the principal ideal 〈a〉, then .

One can readily check that the operations intersection, sum and product on ideals in a ring are associative and commutative.

Commutative algebra extensively uses the theory of prime and maximal ideals (Definition 2.19, Proposition 2.9, Corollary 2.2 and Exercise 2.23). The set of all prime ideals in A is called the (prime) spectrum of A and is denoted by Spec A. The set of all maximal ideals of A is called the maximal spectrum of A and denoted by Spm A. We have Spm A ⊆ Spec A. These two sets play an extremely useful role for the study of the ring A. If A is non-zero, both these sets are non-empty.

Localization

The concept of formation of fractions of integers to give the rationals can be applied in a more general setting. Instead of having any non-zero element in the denominator of a fraction we may allow only elements from a specific subset. All we require to make the collection of fractions a ring is that the allowed denominators should be closed under multiplication.

Definition 2.93.

Let A be a ring. A non-empty subset S of A is called multiplicatively closed or simply multiplicative, if and for any s, we have .

Example 2.25.
  1. For a non-zero ring A, the subset A \ {0} is multiplicatively closed, if and only if A is an integral domain. For a general non-zero ring A, the set of all elements such that a is not a zero-divisor is a multiplicative subset of A.

  2. Let A be a ring and a a proper ideal of A. The set is multiplicatively closed, if and only if is a prime ideal of A.

  3. For a ring A and an element , the set {1, f, f2, f3, . . .} ⊆ A is multiplicatively closed.

Let A be a ring and S a multiplicative subset of A. We define a relation ~ on A × S as: (a, s) ~ (b, t) if and only if u(atbs) = 0 for some . (If A is an integral domain, one may take u = 1 in the definition of ~.) It is easy to check that ~ is an equivalence relation on A × S. The set of equivalence classes of A × S under ~ is denoted by S–1A, whereas the equivalence class of is denoted as a/s. For a/s, , define (a/s) + (b/t) := (at + bs)/(st) and (a/s)(b/t) := (ab)/(st). It is easy to check that these operations are well-defined and make S–1 A a ring with identity 1/1, in which each s/1, , is invertible. There is a canonical ring homomorphism taking aa/1. In general, is not injective. However, if A is an integral domain and 0 ∉ S, then the injectivity of can be proved easily and we say that the ring A is canonically embedded in the ring S–1A.

Definition 2.94.

Let A be a ring and S a multiplicative subset of A. The ring S–1A constructed as above is called the localization of A away from S or the ring of fractions of A with respect to S.

Example 2.26.
  1. Let A be an integral domain and let S = A \ {0}. Then S–1A is called the quotient field or the field of fractions of A and is denoted as Q(A). If A is already a field, then Q(A) ≅ A. Other examples include and Q(K[X]) = K(X), K a field, where K(X) denotes the field of rational functions over K in one indeterminate X.

    More generally, if A is any ring and S is the set of all non-zero-divisors of A, then S–1A is called the total quotient ring of A and is again denoted by Q(A). It is, in general, not a field. If A is an integral domain, then S = A \ {0} and the usage of Q(A) remains consistent.

  2. Let A be a ring, a prime ideal of A and . Then S–1A is called the localization of A at and is usually denoted by Ap.

  3. Let A be a ring, and S = {1, f, f2, f3, . . . }. In this case, S–1A is conventionally denoted by Af.

Integral dependence

The concept of integral dependence generalizes the notion of integers. Recall that for a field extension KL, an element is called algebraic over K, if α is a root of a non-zero polynomial . Since K is a field, the polynomial f can be divided by its leading coefficient, giving a monic polynomial in K[X] of which α is a root. However, if K is not a field, division by the leading coefficient is not always permissible. So we require the minimal polynomial to be monic in order to define a special class of objects.

Definition 2.95.

Let AB be an extension of rings. An element is said to be integral over A, if α satisfies[15] (that is, is a root of) a monic (and hence non-zero) polynomial . An equation of the form f(α) = 0, monic, is called an equation of integral dependence of α over A.

[15] Strictly speaking, α being a root of f(X) is equivalent to α satisfying the polynomial equation f(α) = 0. Often the term equation is dropped in this context—a harmless colloquial contraction.

Example 2.27.
  1. If both A and B are fields, the concepts of integral and algebraic elements are the same. (See the argument preceding Definition 2.95.)

  2. Take and and let , gcd(a, b) = 1, be integral over . Let (a/b)n + αn–1(a/b)n–1 + · · · + α1(a/b) + α0, , be an equation of integral dependence of a/b over . Multiplication by bn gives an = –bn–1an–1 + · · · + α1abn–2 + α0bn–1), that is, b|an. Since gcd(a, b) = 1, this forces b = ±1, that is, . This is, in general, true for any UFD A and its field of fractions B = Q(A) (See Exercise 2.131).

  3. Every element is integral over A, since it satisfies the monic polynomial .

Now let AB be an extension of rings and let C consist of all the elements of B that are integral over A. Clearly, ACB. It turns out that C is again a ring. This result is not at all immediate from the definition of integral elements. We prove this by using the following lemma which generalizes Theorem 2.33.

Lemma 2.11.

For a ring extension AB and for , the following conditions are equivalent:

  1. α is integral over A.

  2. A[α] is a finitely generated A-module.

  3. A[α] ⊆ C for some subring C of B with C being a finitely generated A-module.

Proof

[(a)⇒(b)] Let αn + an–1αn–1 + · · · + a1α + a0 = 0, , be an equation of integral dependence of α over A. is generated as an A-module by 1, α, α2, . . . . In order to show that only the elements 1, α, . . . , αn–1 generate A[α] as an A-module, it is sufficient to show that each αk, , is an A-linear combination of 1, α, . . . , αn–1. We proceed by induction on k. The assertion certainly holds for k = 0, . . . , n – 1, whereas for kn we write αk = –(an–1αk–1 + · · · + a1αkn+1 + a0αkn), whence induction completes the proof.

[(b)⇒(c)] Take C := A[α].

[(c)⇒(a)] Let generate C as an A-module. Since A[α] ⊆ C and, in particular, , for all i = 1, . . . , n we can write for some . Let denote the matrix (αδijaij)1≤i,jn, where δij is the Kronecker delta. Then . Multiplication (on the left) by the adjoint of shows that for all i = 1, . . . , n. Since , we have for some , so that (det ) · 1 = 0, that is, det . But det is a monic polynomial in α of degree n and with coefficients from A.

Proposition 2.42.

For an extension AB of rings, the set

is a subring of B containing A.

Proof

Clearly, ACB as sets. To show that C is a ring let α, . By Condition (b) of Lemma 2.11, A[α] is a finitely generated A-module. Now β, being integral over A, is also integral over A[α]; so again by Lemma 2.11(b), A[α][β] is a finitely generated A[α]-module. It is then easy to check that A[α, β] = A[α][β] is a finitely generated A-module. Since α ± β and αβ are in A[α, β], by Lemma 2.11(c), these elements are integral over A, that is, belong to C. Thus C is a ring.

Definition 2.96.

The ring C of Proposition 2.42 is called the integral closure of A in B. A is called integrally closed in B, if C = A. On the other hand, if C = B, we say that B is an integral extension of A or that B is integral over A.

An integral domain A is called integrally closed (without specific mention of the ring in which it is so), if A is integrally closed in its quotient field Q(A). An integrally closed integral domain is called a normal domain (ND).

Example 2.28.
  1. (or more generally any UFD) is a normal domain.

  2. is not integrally closed in or , since, for example, is integral over . The integral closure of in is denoted by . Elements of are called algebraic integers (See Exercise 2.60).

Noetherian rings

Recall that a PID is a ring (integral domain) in which every ideal is principal, that is, generated by a single element. We now want to be a bit more general and demand every ideal to be finitely generated. If a ring meets our demand, we call it a Noetherian ring. These rings are named after Emmy Noether (1882–1935) who was one of the most celebrated lady mathematicians of all ages and whose work on Noetherian rings has been very fundamental and deep in the branch of algebra. Emmy’s father Max Noether (1844 –1921) was also an eminent mathematician.

Definition 2.97.

Let A be a ring and let be an ascending chain of ideals of A. This chain is called stationary, if there is an such that . The ring A is said to satisfy the ascending chain condition or the ACC, if every ascending chain of ideals in A is stationary, or in other words, if there does not exist any infinite strictly ascending chain of ideals in A.

Proposition 2.43.

For a ring A, the following conditions are equivalent:

  1. Every ideal of A is finitely generated.

  2. A satisfies the ascending chain condition.

  3. Every non-empty set of ideals of A contains a maximal element.

Proof

[(a)⇒(b)] Let be an ascending chain of ideals of A. Consider the ideal which is finitely generated by hypothesis. Let a1, . . . , ar be a set of generators of . Each , that is, there exists such that and hence for every nmi. Take m := max(m1, . . . , mr). For every nm, we have a , that is, .

[(b)⇒(c)] Let S be a non-empty set of ideals of A. Order S by inclusion. The ACC implies that every chain in S has an upper bound in S. By Zorn’s lemma, S has a maximal element.

[(c)⇒(a)] Let be an ideal of A. Consider the set S of all finitely generated ideals of A contained in . S is non-empty, since it contains the zero ideal. By condition (c), S has a maximal element, say, . If , take . Then is finitely generated (since is so), properly contains and is contained in . This contradicts the maximality of in S. Thus we must have , that is, is finitely generated.

Definition 2.98.

A ring A is called Noetherian, if A satisfies (one and hence all of) the equivalent conditions of Proposition 2.43.

Example 2.29.
  1. All PIDs are Noetherian, since principal ideals are obviously finitely generated. In particular, and K[X] (K a field) are Noetherian.

  2. If A is Noetherian and an ideal of A, then is Noetherian, since the ideals of are in one-to-one inclusion-preserving correspondence with the ideals of A containing a and hence satisfy the ACC.

  3. Let A be a Noetherian ring and S a multiplicative subset of A. Then the localization B := S–1A is also Noetherian. To prove this fact let be an ideal in B. One can show that for some ideal of A. Since A is Noetherian, is finitely generated, say, . It is now (almost) obvious that is generated by a1/1, . . . , ar/1. A particular case: If A is Noetherian and a prime ideal of A, then the localization is also Noetherian.

  4. The ring of polynomials with infinitely many indeterminates X1, X2, X3, . . . is not Noetherian. This is because the ideal

    X1, X2, X3, . . .〉 = AX1 + AX2 + AX3 + · · ·

    is not finitely generated, or alternatively because we have the infinite strictly ascending chain of ideals: 〈X1〉  〈X1, X2〉  〈X1, X2, X3〉  · · ·, or because the set S := {〈X1〉, 〈X1, X2〉, 〈X1, X2, X3〉, . . .} of ideals in A does not contain a maximal element.

We have seen that if A is a PID, the polynomial ring A[X] need not be a PID. However, the property of being Noetherian is preserved during the passage from A to A[X] (Theorem 2.8).

Dedekind domains

A class of rings proves to be vital in the study of number fields:

Definition 2.99.

An integral domain A is called a Dedekind domain, if it satisfies all of the following three conditions:

  1. A is Noetherian.

  2. Every non-zero prime ideal of A is maximal.

  3. A is integrally closed (in its quotient field K := Q(A)).

2.13.2. Number Fields and Rings

After much ado we are finally in a position to define the basic objects of study in this section.

Definition 2.100.

A number field K is defined to be a finite (and hence algebraic) extension of the field of rational numbers. Clearly, . The extension degree is called the degree of the number field K and is finite by definition.

Note that there exist considerable controversies among mathematicians in accepting this definition of number fields. Some insist that any field K satisfying should be called a number field. Some others restrict the definition by demanding that one must have K algebraic over ; however, fields K with infinite extension degree are allowed. We restrict the definition further by imposing the condition that has to be finite. Our restricted definition is seemingly the most widely accepted one. In this book, we study only the number fields of Definition 2.100 and accepting this definition would at the minimum save us from writing huge expressions like “(algebraic) number fields of finite extension degree over ” to denote number fields.

For number fields, the notion of integral closure leads to the following definition.

Definition 2.101.

A number field K contains and hence . The integral closure of in K is called the ring of integers of K and is denoted by . ( is the Gothic O.) Clearly, and is an integral domain. We also have , where is the subset of comprising all algebraic integers. A number ring is a ring which is (isomorphic to) the ring of integers of a number field.

By Example 2.27(2), the ring of integers of the number field is , that is, . It is, therefore, customary to call the elements of rational integers. Since is naturally embedded in for any number field K, it is important to notice the distinction between the integers of K (that is, the elements of ) and the rational integers of K (that is, the images of the canonical inclusion ).

Some simple properties of number rings are listed below.

Proposition 2.44.

For a number field K, we have:

  1. .

  2. For , there exists a rational integer such that . In particular, the quotient field of is K.

  3. is integrally closed in , that is, is a normal domain.

Proof

(1) follows immediately from Example 2.27(2), (2) follows from Exercise 2.60, and (3) follows from Exercise 2.126(b).

Let K be a number field of degree d. By Corollary 2.13, K is a simple extension of , that is, there exists an element with a minimal polynomial f(X) over such that deg and . The field K is a -vector space of dimension d with basis 1, α, . . . , αd–1. There exists a nonzero integer a such that is an algebraic integer and we continue to have . Thus, without loss of generality, we may take α to be an algebraic integer. In this case, the -basis 1, α, . . . , αd–1 of K consists only of algebraic integers.

Conversely, let be an irreducible polynomial of degree d ≥ 1. The field is a number field of degree d and the elements of K can be represented by polynomials with rational coefficients and of degrees < d. Arithmetic in K is carried out as the polynomial arithmetic of followed by reduction modulo the defining irreducible polynomial f(X). This gives us an algebraic representation of K independent of any element of K. Now, K can also be viewed as a subfield of and the elements of K can be represented as complex numbers.[16] A representation with a field isomorphism is called a complex embedding of K in .[17] Such a representation is not unique as Proposition 2.45 demonstrates.

[16] A complex number has a representation by a pair (a, b) of real numbers. Here, plays the role of X + 〈X2 + 1〉 in . Finally, every real number has a decimal (or binary or hexadecimal or . . .) representation.

[17] The field is canonically embedded in K. It is evident that the embedding σ : KK′ fixes element-wise.

Proposition 2.45.

A number field K of degree d ≥ 1 has exactly d distinct complex embeddings.

Proof

As above we take for some irreducible polynomial of degree d. Since is a perfect field (See Exercise 2.76), the d roots of f(X) are all distinct. For each i = 1, . . . , d, the map sending X + 〈f(X)〉 ↦ αi clearly extends to a field isomorphism . Thus we get d distinct complex embeddings of K in . Now let K′ be a subfield of , such that is a -isomorphism. Let α := σ(X + 〈f(X)〉). Then 0 = σ(0) = σ(f(X + 〈f(X)〉)) = f(σ(X + 〈f(X)〉)) = f(α). Thus α is a root of f, that is, α = αi for some . Since K′ is a field containing and αi and having , it follows that and σ = σi.

This proposition says that the conjugates α1, . . . , αd are algebraically indistinguishable. For example, X2 + 1 has two roots ±i, where . But it makes little sense to talk about the positive and the negative square roots of –1? They are algebraically indistinguishable and if one calls one of these i, the other one becomes –i.[18] However, if a representation of is given, we can distinguish between and by associating these quantities with the elements and respectively, where is the positive real square root of 5 and where is the imaginary unit available from the given representation of .

[18] In a number theory seminar in 1996, Hendrik W. Lenstra, Jr. commented:

Suppose the Martians defined the complex numbers by adjoining a root of –1 they called j. And when the Earth and Martians start talking, they have to translate i to be either j or –j. So we take i to j, because I think that’s what the scientists will decide. ··· But it was later discovered that most Martians are left handed, so the philosophers decide it’s better to send i to –j instead.

It is also quite customary to start with for some algebraic and seek for the complex embeddings of K in . One then considers the minimal polynomial f(X) of α (over ) and proceeds as in the proof of Proposition 2.45 but now defining the map as the unique field isomorphism that fixes and takes α ↦ αi. If we take α = α1, then σ1 is the identity map, whereas σ2, . . . , σd are non-identity field isomorphisms.

The moral of this story is that whether one wants to view the number field K as or as for any is one’s personal choice. In any case, one will be dealing with the same mathematical object and as long as representation issues are not brought into the scene, all these definitions of a number field are absolutely equivalent.

The embeddings need not be all distinct as sets. For example, the two embeddings and of are identical as sets. But the maps x ↦ i and x ↦ –i are distinct (where x := X + 〈X2 + 1〉). Thus while specifying a complex embedding of a number field K, it is necessary to mention not only the subfield K′ of isomorphic to K, but also the explicit field isomorphism KK′.

Definition 2.102.

Let K be a number field of degree d defined by an irreducible polynomial or by any root of f(X). Let r1 be the number of real roots and 2r2 the number of non-real roots of f. (Note that the non-real roots of a real polynomial occur in (complex) conjugates.) By the fundamental theorem of algebra, we have d = r1 + 2r2. For any real root α of f, the complex embedding of K is completely contained in and hence is often called a real embedding of K. On the other hand, for a non-real root β of f the complex embedding of K is called a non-real or a properly complex embedding of K. The pair (r1, r2) is called the signature of the number field K. K has r1 real embeddings and 2r2 properly complex embeddings. If r2 = 0, that is, if all embeddings of K are real, one calls K a totally real number field. On the other hand, if r1 = 0, that is, if all embeddings of K are properly complex, then K is called a totally complex number field.

Example 2.30.
  1. The number field is totally real and has the signature (2, 0). (The roots of X2 – 2 are .)

  2. The number field is totally complex and has the signature (0, 1). (The roots of X2 + 2 are .)

  3. The number field is neither totally real nor totally complex. The roots of X3 – 2 are and . The signature of K is (1, 1), that is, K has one real embedding and two properly complex embeddings.

The simplest examples of number fields are the quadratic number fields, that is, number fields of degree 2. Some special properties of quadratic number fields are covered in the exercises. It follows from Exercise 2.136 that every quadratic number field is of the form for some non-zero square-free integer D ≠ 1.

Now we investigate the -module structure of for a number field K of degree d. Let σ1, . . . , σd be the complex embeddings of K.

Definition 2.103.

For an element , we define the trace of α (over ) as

Equation 2.15


and the norm of α (over ) as

If g(X) is the minimal polynomial of α over and r := deg g, then r|d. Moreover, . So Tr(α) and N(α) belong to . If α is an algebraic integer, then , that is, Tr(α), .

The following properties of the norm and trace functions can be readily verified. Here α, and .

Tr(α + β)=Tr(α) + Tr(β),
N(αβ)=N(α)N(β),
Tr(cα)=c Tr(α),
N(cα)=cdN(α),
Tr(c)=cd,
N(c)=cd.

Definition 2.104.

Let . We call the determinant of the matrix (Tr(βiβj))1≤i,jd, whose ij-th entry is equal to Tr(βiβj), the discriminant Δ(β1, . . . , βd) of β1, . . . , βd. Since each Tr, it follows that . Moreover, if β1, . . . , βd are all algebraic integers, then .

Proposition 2.46.

Δ(β1, . . . , βd) = (det(σji)))2.

Proof

Consider the matrices D := (Tr(βiβj)) and E := (σji)). By definition, we have Δ(β1, . . . , βd) = det D. We show that D = EEt, which implies that det D = (det E)2. The ij-th entry of EEt is

where the last equality follows from Equation (2.15).

Let for some and let f(X) be the minimal polynomial of α over . We define the discriminant of f as

Δ(f) := Δ(1, α, α2, ..., αd–1).

We have to show that the quantity Δ(f) is well-defined, that is, independent of the choice of the root α of f(X). Let α = α1, α2, . . . .αd be all the roots of f(X) and let the complex embedding σj of K map α to αj. By Proposition 2.46, we have Δ(f) = (det E)2, where . Computing the determinant of E gives , which implies that Δ(f) is independent of the permutations of the conjugates α1, . . . , αd of α. Notice that since α1, . . . , αd are all distinct, Δ(f) ≠ 0.

Let us deduce a useful formula for Δ(f). Write and take formal derivative to get , that is, . Therefore, , that is,

Equation 2.16


For arbitrary , the discriminant Δ(β1, . . . , βd) discriminates between the cases that β1, . . . , βd form a -basis of K and that they do not.

Lemma 2.12.

Let satisfy for i = 1, . . . , d and for . Then Δ(γ1, . . . , γd) = (det T)2Δ(β1, . . . , βd), where T = (tij).

Proof

Let E1 := (σji)) and E2 := (σji)). Now

is the ij-th entry of the matrix T E1, that is, E2 = T E1. Hence

Δ(γ1, . . . , γd) = (det E2)2 = (det T)2(det E1)2 = (det T)2Δ(β1, . . . , βd).

Corollary 2.19.

Let and be two -bases of K. Let and . Then , where T is the change-of-basis matrix from to .

Corollary 2.20.

form a -basis of K, if and only if Δ(β1, . . . , βd) ≠ 0.

Proof

Let , and . Since is a -basis of K, each βi can be written (uniquely) as with . By Lemma 2.12, , where . We have seen that . Therefore, is a -basis of K.

Finally comes the desired characterization of .

Theorem 2.55.

For a number field K of degree d, the ring is a free -module of rank d.

Proof

Let form a -basis of K. We know that for some the elements r1β1, . . . , rdβd are in and continue to constitute a -basis of K. So we may assume that the elements β1, . . . , βd are already in . Consider the set S of all -basis (β1, . . . , βd) of K consisting of elements from only. By Definition 2.104 and Corollary 2.20, for every . Choose such that is minimal in S.

Claim: is linearly independent over .

is a -basis of K, that is, linearly independent over and so trivially over too.

Claim: generates as a -module.

Assume not, that is, there exists such that α = a1β1 + · · · + adβd with some . Without loss of generality, we may assume that and write a1 = a + r with and 0 < r < 1. Define γ1 := α – aβ1 = rβ1 + a2β2 + · · · + adβd, γ2 := β2, . . . , γd := βd. Clearly, . Furthermore, if

by Lemma 2.12, we have

Δ(γ1, . . . , γd) = (det T)2Δ(β1, . . . , βd) = r2Δ(β1, . . . , βd).

Since r ≠ 0, Δ(γ1, . . . , γd) ≠ 0, that is, (γ1, . . . , γd) is again a -basis of K (Corollary 2.20), that is, . Finally since r < 1, we have |Δ(γ1, . . . , γd)| < |Δ(β1, . . . , βd)|, a contradiction to the choice of (β1, . . . , βd). Thus every has to be a -linear combination of β1, . . . , βd. This completes the proof of the second claim and also of the theorem.

Definition 2.105.

Any -basis of is called an integral basis of K (or of ).

Corollary 2.21.

Every integral basis of K has the same discriminant (for a given K).

Proof

Let and be two integral bases of K. Let T be the -to- change-of-basis matrix. being an integral basis of K, all the entries of T are integers. Also from Corollary 2.19 we have and hence divides and has the same sign as . One can analogously show . Therefore, .

Definition 2.106.

Let be an integral basis of a number field K. The discriminant of K is defined to be the integer . By Corollary 2.21, ΔK is well-defined, that is, independent of the choice of the integral basis of K.

Recall that K, as a vector space over , always possesses a -basis of the form 1, α, . . . , αd–1. , as a -module, is free of rank d, but every number field K need not possess an integral basis of the form 1, α, . . . , αd–1. Whenever it does, is called monogenic and an integral basis 1, α, . . . , αd–1 of K is called a power integral basis. Clearly, if K has a power integral basis 1, α, . . . , αd–1, then . But the converse is not true, that is, for with , 1, α, . . . , αd–1 need not be an integral basis of K, even when is monogenic.

Example 2.31.

Consider the quadratic number field for some square-free integer D ≠ 0, 1. We consider the two cases (See Exercise 2.136):

Case 1: D ≡ 2, 3 (mod 4)

Here , that is, is a power integral basis of K. The minimal polynomial of is X2D and the conjugates of are ±. Therefore, by Equation (2.16), we have

Case 2: D ≡ 1 (mod 4)

In this case, , that is, is a power integral basis of K. The minimal polynomial of is and the conjugates of are ±. Therefore, Equation (2.16) gives

2.13.3. Unique Factorization of Ideals

Ideals in a number ring possess very rich structures. We prove that number rings are Dedekind domains (Definition 2.99). A Dedekind domain (henceforth abbreviated as DD) need not be a UFD (or a PID). However, it is a ring in which ideals admit unique factorizations into products of prime ideals.

Let K be a number field of degree and its ring of integers. If is a homomorphism of rings and if is a prime ideal of B, then the contraction is a prime ideal of A. We say that lies above or over . If AB and is the inclusion homomorphism, then . For a number field K, we consider the natural inclusion .

Lemma 2.13.

Let be a non-zero prime ideal of . Then lies above a unique non-zero prime ideal of . In particular, contains a (unique) rational prime.

Proof

Let . If , then both and 0 are prime ideals of that lie over the zero ideal of . Since , by Exercise 2.128(c), a contradiction.

Proposition 2.47.

is Noetherian.

Proof

Let constitute an integral basis of K, that is, , that is, the ring homomorphism mapping f(X1, . . . , Xd) ↦ f1, . . . , αd) is surjective. By Hilbert’s basis theorem (Theorem 2.8), the polynomial ring is Noetherian and so , being the quotient of a Noetherian ring (by the isomorphism theorem), is Noetherian too (Example 2.29).

Theorem 2.56.

The ring of integers of a number field K is a Dedekind domain.

Proof

We have proved that is Noetherian (Proposition 2.47) and integrally closed (Proposition 2.44). It then suffices to show that each non-zero prime ideal of is maximal. By Lemma 2.13, lies over a non-zero prime ideal of . But is maximal in . Exercise 2.128(b) now completes the proof.

Now we derive the unique factorization theorem for ideals in a DD. It is going to be a long story. We refer the reader to Definition 2.92 to recall how the product of two ideals is defined.

Lemma 2.14.

Let A be a ring, , ideals of A, and a prime ideal of A such that . Then for some . In particular, if A is a DD and are non-zero prime ideals, then for some .

Proof

The proof is obvious for r = 1. So assume that r > 1. If for all i = 1, . . . , r, then for each i we can choose and see that , a contradiction to that is prime. The last statement of the lemma follows from the fact that in a DD every non-zero prime ideal is maximal.

We now generalize the concept of ideals.

Definition 2.107.

Let A be an integral domain and K := Q(A). An A-submodule of K is called a fractional ideal of A, if for some .

Every ideal of A is evidently a fractional ideal of A and hence is often called an integral ideal of A. Conversely, every fractional ideal of A contained in A is an integral ideal of A. The principal fractional ideal Ax is the A-submodule of K generated by . If A is a Noetherian domain, we have the following equivalent characterization of fractional ideals.

Lemma 2.15.

Let A be a Noetherian integral domain, K := Q(A) and . Then is a fractional ideal of A, if and only if is a finitely generated A-submodule of K.

Proof

[if] Let , where xi = ai/bi, ai, , bi ≠ 0. Then .

[only if] Let be such that . Now ba is an (integral) ideal of A (easy check) and is finitely generated, since A is Noetherian. Let , . Then , where .

We define the product of two fractional ideals , of an integral domain A as we did for integral ideals:

It is easy to check that is again a fractional ideal of A. Let denote the set of non-zero fractional ideals of A. The product of fractional ideals defines a commutative and associative binary operation on . The ideal A acts as a (multiplicative) identity in . A fractional ideal of A is called invertible, if for some fractional ideal of A. We deduce shortly that if A is a DD, then every non-zero fractional ideal of A is invertible and, therefore, is a group under multiplication of fractional ideals.

Lemma 2.16.

Let A be a Noetherian domain and an (integral) ideal of A. For some , there exist prime ideals of A each containing such that .

Proof

Let S be the set of ideals of A for which the lemma does not hold. Assume that . Since A is Noetherian, S contains a maximal element, say . Clearly, is a proper non-prime ideal of A, that is, for some a, we have . The ideals and strictly contain and, therefore, by the maximality of are not in S, that is, there exist prime ideals each containing (and hence ) such that and prime ideals each containing (and hence ) such that . Moreover, , since , so that , a contradiction. Thus S must be empty.

Note that the condition “each containing ” was necessary in Lemma 2.16 in order to rule out the trivial possibility that for some .

Lemma 2.17.

Let A be a DD, K := Q(A) and a non-zero prime ideal of A. Define the set

.

Then we have:

  1. is a fractional ideal of A.

  2. .

  3. . In particular, every non-zero prime ideal in a DD is invertible.

Proof

  1. Clearly, is an A-submodule of K, and for , we have .

  2. Since , we have . In order to prove the strict inclusion, we take any and consider the ideal . By Lemma 2.16, there exist prime ideals each containing (and hence non-zero) such that . We choose r to be minimal, so that does not contain the product of any r – 1 of . Now and hence by Lemma 2.14 for some i, say, i = r. Choose any . Since , we have . On the other hand, and , so that , that is, .

  3. By the definition of , it follows that is contained in and hence an integral ideal of A. Since , it follows that . Since is a maximal ideal, we then have or . Assume that . We claim that this assumption implies that , a contradiction to Part (2). So we must have . For proving the claim, let and choose . Then we have and, therefore, and so on. For each , define the ideal . Then is an ascending chain of ideals in A. Since A is Noetherian, the chain must be stationary, that is, for some we have , that is, , that is, with . Since A is an integral domain and a ≠ 0, we see that b is integral over A. Since A is integrally closed, . Therefore, , as claimed.

Theorem 2.57.

Every non-zero ideal in a DD A can be represented as a product of prime ideals of A. Moreover, such a factorization of is unique up to permutations of the factors.

Proof

If , there is nothing to prove. So let be a proper ideal of A. We first show that if contains a product of non-zero prime ideals, then is a product of prime ideals. By Lemma 2.16, we have prime ideals , , of A each containing , such that . Let us choose r to be minimal and proceed by induction on r. If r = 1, is already prime. So take r > 1 and assume that if an ideal of A contains a product of r – 1 or less non-zero prime ideals of A, then is a product of prime ideals. Let be a maximal ideal containing . We then have and by Lemma 2.14 for some i, say, i = r. Now, consider the fractional ideal . Then and so is an integral ideal of A. Furthermore , that is, contains a product of r – 1 non-zero prime ideals. By the induction hypothesis, is a product of prime ideals, that is, . But then is also a product of prime ideals.

In order to prove the uniqueness of this product, let with prime ideals and . Now and by Lemma 2.14 for some , say, j = 1. Then . Proceeding in this way shows the desired uniqueness.

In the factorization of a non-zero ideal of a DD, we do not rule out the possibility of repeated occurrences of factors. Taking this into account shows that every non-zero ideal in a DD A admits a unique factorization

with distinct non-zero prime ideals and with exponents . Here uniqueness is up to permutations of the indexes 1, . . . , r. This factorization can be extended to fractional ideals, but this time we have to allow non-positive exponents. First note that for integers e1, . . . , er and non-zero prime ideals of A the product is well-defined and is a fractional ideal of . The converse is proved in the following corollary.

Corollary 2.22.

Every non-zero fractional ideal of a DD A admits a unique factorization of the form with non-zero prime ideals of A and with exponents . Moreover for such a fractional ideal we have .

Proof

By definition, there exists such that . But then is an integral ideal of A. We write and with fi, . Since each non-zero prime ideal is invertible (Lemma 2.17(3)), it follows that . This proves the existence of a factorization of . The proof for the uniqueness is left to the reader as an easy exercise. The last assertion follows from a repeated use of Lemma 2.17(3).

The fractional ideal in Corollary 2.22 is denoted by . We have . One can easily verify that defined as above is equal to the set

In fact, one can use the last equality as the definition for .

To sum up, every non-zero fractional ideal of a DD A is invertible and the set of all non-zero fractional ideals of A is a group. The unit ideal A acts as the identity in .

As in every group, we have the cancellation law(s) in .

Corollary 2.23.

Let A be a DD and , , fractional ideals of A. If and , then .

In view of unique factorization of ideals in A, we can speak of the divisibility of integral ideals in A. Let and be two integral ideals of A. We say that divides and write , if for some integral ideal of A. We now show that the condition is equivalent to the condition . Thus for ideals in a DD the term divides is synonymous with contains.

Corollary 2.24.

Let and be integral ideals of a DD A. Then if and only if .

Proof

[if] If , we have , that is, is an integral ideal of A.

Also .

[only if] If for some integral ideal , we have .

Corollary 2.25.

Let and with ei, be the prime decompositions of two non-zero integral ideals of a DD A. Then if and only if eifi for all i = 1, . . . , r.

Proof

[if] We have , where is an integral ideal of A.

[only if] Let for some integral ideal of A. Clearly, and we can write the prime decomposition with li ≥ 0. We have . By unique factorization, we have f1 = e1 + l1, . . . , fr = er + lr and lr+1 = · · · = lr+s = 0.

As we pass from to , the notion of unique factorization passes from the element level to the ideal level. If a DD is already a PID, these two concepts are equivalent. (Non-zero prime ideals in a PID are generated by prime elements.) Though every UFD need not be a PID, we have the following result for a DD.

Proposition 2.48.

A Dedekind domain A is a UFD, if and only if A is a PID.

Proof

[if] Every PID is a UFD (Theorem 2.11).

[only if] Let A be a UFD. In order to show that A is a PID, it suffices (in view of Theorem 2.57) to show that every non-zero prime ideal of A is a principal ideal. Choose any non-zero . Then . Now a is a non-unit in A (since otherwise we would have ) and A is assumed to be a UFD. Thus we can write a = uq1 · · · qr for , and for prime elements qi in A. Clearly, each 〈qi〉 is a non-zero prime ideal of A and 〈a〉 = 〈q1〉 · · · 〈qr〉. Therefore, and hence by Lemma 2.14 for some .

In the rest of this section, we abbreviate as , if K is implicit in the context.

2.13.4. Norms of Ideals

We have seen that the ring is a free -module of rank d. The same result holds for every non-zero ideal of . Let β1, . . . , βd constitute an integral basis of K.

One can choose rational integers aij with each aii positive such that

Equation 2.17


constitute a -basis of . Moreover, the discriminant Δ(γ1, . . . , γd) is independent of the choice of an integral basis γ1, . . . , γd of and is called the discriminant of , denoted . It follows that can be generated as an ideal (that is, as an -module) by at most d elements. We omit the proof of the following tighter result.

Proposition 2.49.

Every (integral) ideal in a DD A is generated by (at most) two elements. More precisely, for a proper non-zero ideal of A and for any there exists with .

Definition 2.108.

The norm of a non-zero ideal of is defined as the cardinality of the quotient ring . It is customary to define the norm of the zero ideal as zero.

Using the integers aij of Equations (2.17), we can write

Equation 2.18


Corollary 2.26.

For every non-zero ideal of , the quotient ring is a finite ring. In particular, if is a non-zero prime (hence maximal) ideal of , then is a finite field.

It is tempting to define the norm of an element to be the norm of the principal ideal . It turns out that this new definition is (almost) the same as the old definition of N(α). More precisely:

Proposition 2.50.

For any element , we have N(〈α〉) = |N(α)|.

Proof

The result is obvious for α = 0. So assume that α ≠ 0 and call . Let β1, . . . , βd be an integral basis of . It is an easy check that αβ1, . . . , αβd is an integral basis of . Let σ1, . . . , σd be the complex embeddings of K. Then is the square of the determinant of the matrix

It follows that . Equation (2.18) now completes the proof.

Corollary 2.27.

For any , we have .

Like the norm of elements, the norm of ideals is also multiplicative. We omit the (not-so-difficult) proof here.

Proposition 2.51.

Let and be ideals in . Then, .

The following immediate corollary often comes handy.

Corollary 2.28.

Let and be non-zero ideals of . If is the factorization of , then . In particular, if , then (in ).

2.13.5. Rational Primes in Number Rings

The behaviour of rational primes in number rings is an interesting topic of study in algebraic number theory. Let K be a number field of degree d and . Consider a rational prime p and denote by 〈p〉 the ideal generated by p in . We use the symbol to denote the (prime) ideal of generated by p. Further let

Equation 2.19


be the prime factorization of 〈p〉 with , with pairwise distinct non-zero prime ideals of and with . For each i, we have , that is, , that is, (Lemma 2.13), that is, lies over . Conversely if is an ideal of lying over , then , that is, , that is, , that is, for some i. Thus, are precisely all the prime ideals of that lie over .

By Corollary 2.27, N(〈p〉) = pd. By Corollary 2.28, each divides pd and is again a power pdi of p.

Definition 2.109.

We define the ramification index of over p (or ) as . This is the largest such that divides (that is, contains) 〈p〉. The integer di (where is called the inertial degree of over p.

By the multiplicative property of norms, we have

Definition 2.110.

If r = d, so that each ei = di = 1, we say that the prime p (or )splits completely in . On the other extreme, if r = 1, e1 = 1, d1 = d, then 〈p〉 is prime in and we say that p is inert in . Finally, if ei > 1 for some i, we say that the prime p ramifies in . If r = 1 and e1 = d (so that d1 = 1), then the prime p is said to be totally ramified in .

The following important result is due to Dedekind. Its proof is long and complicated and is omitted here.

Theorem 2.58.

A rational prime p ramifies in , if and only if p divides the discriminant ΔK. In particular, there are only finitely many rational primes that ramify in .

Though this is not the case in general, let us assume that the ring is monogenic (that is, for some ) and try to compute the explicit factorization (Equality (2.19)) of 〈p〉 in . In this case, and let be the minimal polynomial of α. We then have .

Let us agree to write the canonical image of any polynomial in as . We write the factorization of as

with and with pairwise distinct irreducible polynomials . If , then . For each i = 1, . . . , r choose whose reduction modulo p is . Define the ideals

of . Since , we have

and

Therefore, are non-zero prime ideals of with . Thus . On the other hand, , since f(α) = 0 and . Thus we must have , that is, we have obtained the desired factorization of 〈p〉.

Let us now concentrate on an example of this explicit factorization.

Example 2.32.

Let D ≠ 0, 1 be a square-free integer congruent to 2 or 3 modulo 4. If , then is monogenic. We take an odd rational prime p and compute the factorization of 〈p〉 in . We have to factorize modulo p the minimal polynomial f(X) := X2D. We consider three cases separately based on the value of the Legendre symbol .

Case 1:

In this case, p|D, that is, . Then , where . Thus p (totally) ramifies in .

Case 2:

Since p is assumed to be an odd prime, the two square roots of D modulo p are distinct. Let δ be an integer with δ2D (mod p). Then . In this case, , where and . Thus p splits (completely) in .

Case 3:

The polynomial is irreducible in and hence remains prime in , that is, p is inert in .

Thus the quadratic residuosity of D modulo p dictates the behaviour of p in .

Let us finally look at the fate of the even prime 2 in . If D is even, then and if D is odd, then . In each case, 2 ramifies in .

Recall from Example 2.31 that ΔK = 4D. Thus we have a confirmation of the fact that a rational prime p ramifies in if and only if pK.

One can similarly study the behaviour of rational primes in

,

where D ≡ 1 (mod 4) is a square-free integer ≠ 0, 1.

2.13.6. Units in a Number Ring

There are just two units in , namely ±1. In a general number ring, there may be many more units. For example, all the units in the ring of Gaussian integers are ±1, ±i. There may even be an infinite number of units in a number ring. It can be shown that , , are all the units of . (Note that for all n ≠ 0 the absolute values of are different from 1.) is a PID. So we can think of factorizations in as element-wise factorizations. To start with, we fix a set of pairwise non-associate prime elements of . Every non-zero element of admits a factorization for prime “representatives” pi and for a unit u of the form . Thus, in order to complete the picture of factorization, we need machinery to handle the units in a number ring.

Let K be a number field of degree d and signature (r1, r2). We have d = r1 + 2r2. The set of units in is denoted by . We know that is an (Abelian) group under (complex) multiplication. Our basic aim now is to reveal the structure of the group .

Every Abelian group is a -module and, if finitely generated and not free, contains torsion elements, that is, (non-identity) elements of finite order > 1.[19] always contains the element –1 of order 2. The torsion subgroup of is denoted by . We have , where is a torsion-free group. It turns out that ℜ is a finite group (and hence cyclic) and that is finitely generated and hence free, that is, for some . From Dirichlet’s unit theorem (which we do not prove), it follows that ρ = r1 + r2 – 1. Thus, has a -basis consisting of ρ elements, say ξ1, . . . , ξρ, and every unit of can be uniquely expressed as , where ω is a root of unity and . A set of generators of is called a set of fundamental units.

[19] Every finitely generated torsion-free module over a PID is free.

Example 2.33.

Let D ≠ 0, 1 be a square-free integer, and . If D < 0, the signature of K is (0, 1) and the value of ρ for is 0 + 1 – 1 = 0, that is, , that is, is finite in this case.

Now, suppose D > 0. K is a real field in this case, so that . Also the signature of K is (2, 0), that is, ρ = 2 + 0 – 1 = 1. This means that contains an infinite number of units. Let ξ be a fundamental unit of . Then, every unit of is of the form ±ξn, .

Exercise Set 2.13

2.126
  1. If AB and BC are integral extensions of rings, show that AC is also an integral extension.

  2. Let AB be an extension of rings. Show that the integral closure of A in B is integrally closed in B.

  3. Let AB be an integral extension of rings, an ideal of B and . (Note that is an ideal of A. If is prime in B, then is prime in A. See Proposition 2.10.) Show that is integral over .

2.127Let AB be an extension of integral domains, a finitely generated non-zero ideal of A and . If , show that γ is integral over A. [H]
2.128
  1. Let AB be an integral extension of integral domains. Show that A is a field if and only if B is a field.

  2. Let AB be an integral extension of rings, a prime ideal of B and . Show that is maximal if and only if is maximal. [H]

  3. Let A, B, and be as in (b). Further let be another prime ideal of B with . Show that if , then . [H]

2.129Let A be a ring and S a multiplicatively closed subset of A. Show that:
  1. If , then S–1A is the zero ring.

  2. If S′ := S \ {1} is non-empty and closed under multiplication, then S–1AS–1A.

  3. If A is Noetherian, then S–1A is also Noetherian.

2.130Let AB be a ring extension and C the integral closure of A in B. Show that for any multiplicative subset S of A (and hence of B and C) the integral closure of S–1A in S–1B is S–1C. In particular, if A is integrally closed in B, then so is S–1A in S–1B.
2.131Recall that an integrally closed integral domain is called a normal domain (ND).
  1. Show that every UFD is a normal domain.

  2. Let D be a square-free integer ≠ 0, 1. Show that , is normal if and only if D ≡ 2, 3 (mod 4).

(Remark: The reader should note the following important implications:

That is, a Euclidean domain is a PID, a PID is a UFD and a UFD is a normal domain. Neither of the reverse implications is true. For example, the ring of integers of is known to be a PID but not a Euclidean domain. The ring K[X1, . . . , Xn], n ≥ 2, of multivariate polynomials over a field K is a UFD, but not a PID, since the ideal 〈X1, . . . , Xn〉 is not principal. Finally, is a normal domain (by Exercise 2.136 below), but not a UFD, since are two different factorizations of 6 into irreducible elements.)

2.132A (non-zero) ring A with a unique maximal ideal m is called a local ring. In that case, the field A/m is called the residue field of A.

Let A be ring and a prime ideal of A. Show that the localization is a local ring with the unique maximal ideal generated by elements , and the residue field is canonically isomorphic to the quotient field of the integral domain under the map .

2.133A ring A is called a discrete valuation ring (DVR) or a discrete valuation domain (DVD), if A is a local principal ideal domain. Let A be a DVR with maximal ideal m = 〈p〉. Prove the following assertions:
  1. A is a UFD.

  2. The only primes in A are the associates of p. [H]

  3. Every non-zero element of A can be written as upα, where u is a unit of A and .

  4. Every non-zero ideal of A is of the form 〈pα〉 for some .

  5. A has only one non-zero prime ideal (namely, m).

(Remark: The prime p of A is called a uniformizing parameter or a uniformizer for A and is unique up to multiplication by units.

The map taking upα ↦ α is called a discrete valuation of A and can be naturally extended to a group homomorphism by defining ν(a/b) := ν(a)–ν(b), where a, , b ≠ 0 and K = Q(A) is the quotient field of A. It is often convenient to define ν(0) := +∞. It follows that and .)

2.134
  1. Let A be a local Noetherian integral domain which is not a field. Assume further that the maximal ideal m ≠ 0 of A is the only non-zero prime ideal of A. Show that A is a DVR (that is, a PID) if and only if A is integrally closed.

  2. Let A be a Noetherian integral domain which is not a field. Prove that A is a Dedekind domain if and only if is a DVR for every non-zero prime ideal of A.

2.135
  1. Show that the only units of are ±1 and ±i.

  2. Show that the primes of are associates to the following:

    1. a prime integer ≡ 3 (mod 4),

    2. a + ib, a, , with a2 + b2 equal to 2 or a prime integer ≡ 1 (mod 4).

2.136
  1. Show that every quadratic number field K can be represented as for a square-free integer D ≠ 0, 1.

  2. Let for some square-free integer D ≠ 0, 1. Show that:

(In particular, the ring of integers of is the ring of Gaussian integers.)

2.137Let A be a Dedekind domain.
  1. Let q1 and q2 be two distinct non-zero prime ideals of A. Show that for any e1, we have . [H]

  2. Let be the prime factorization of a non-zero ideal of A with pairwise distinct primes qi and . Show that . [H]

2.138Let A be a Dedekind domain and a non-zero (integral) ideal of A. Show that:
  1. There exists a non-zero (integral) ideal of A such that is a principal ideal. [H]

  2. The number of ideals of A containing is finite.

  3. Every ideal of is principal.

2.139Let and , ei, , be the prime decompositions of two non-zero ideals , of a DD A. Define the gcd and lcm of and as

Show that and lcm. Conclude that . (Note that if A is a general ring, we only have .)

2.140Let K be a number field and .
  1. Let be an ideal of . Show that . In particular, every non-zero ideal of contains a non-zero integer. [H]

  2. Let be a non-zero prime ideal of . Prove that for some , where p is the unique rational prime contained in (Lemma 2.13).

2.141Let K be a number field, , , and . Show that:
  1. , if and only if N(α) = ±1.

  2. , if and only if f(0) = ±1, where is the minimal polynomial of α over .

  3. , if and only if |σ(α)| = 1 for every complex embedding σ of K.

2.142Let K be a number field. We say that K is norm-Euclidean, if for every α, , β ≠ 0, there exist q, such that α = qβ + r and | N(r)| < | N(β)|.
  1. Conclude that if K is norm-Euclidean, then is a Euclidean domain with the Euclidean degree function ν(α) := | N(α)|. (The converse of this is not true. For example, it is known that is not norm-Euclidean, but is a Euclidean domain.)

  2. Prove the following equivalent characterization of a norm-Euclidean number field: K is norm-Euclidean if and only if for every there exists such that | N(α – β)| < 1.

  3. Show that the following number fields are norm-Euclidean:

    , , , and .

  4. Show that is not norm-Euclidean. [H]

2.143In this exercise, one derives that the only (rational) integer solutions of Bachet’s equation

Equation 2.20


are x = 3, y = ±5.

  1. Show that Equation (2.20) has no solutions with x or y even. [H]

    Let (x, y) be a solution of Equation (2.20) with both x and y odd. Then x3 admits a factorization in as .

  2. Let . Show that and that is a UFD. Also the only units of are ±1.

  3. Show that gcd. [H]

  4. Because of unique factorization one can write for c, . Expand the cube and equate the real and imaginary parts to conclude that we must have y = ±5, so that x = 3.

**2.14. p-adic Numbers

Let us now study a different area of algebraic number theory, introduced by Kurt Hensel in an attempt to apply power series expansions in connection with numbers. While trying to explain the properties of (rational) integers mathematicians started embedding in bigger and bigger structures, richer and richer in properties. came in a natural attempt to form quotients, and for some time people believed that that is all about reality. Pythagoras was seemingly the first to locate and prove the irrationality of a number, namely, . It took humankind centuries for completing the picture of the real line. One possibility is to look as the completion of . A sequence an, , of rational numbers is called a Cauchy sequence if for every real ε > 0, there exists such that |aman| ≤ ε for all m, , m, nN. Every Cauchy sequence should converge to a limit and it is (and not ) where this happens. Seeing convergence of Cauchy sequences, people were not wholeheartedly happy, because the real polynomial X2 + 1 did not have—it continues not to have—roots in . So the next question that arose was that of algebraic closure. was invented and turned out to be a nice field which is both algebraically closed and complete.

Throughout the above business, we were led by the conventional notion of distance between points (that is, between numbers)—the so-called Archimedean distance or the absolute value. For every rational prime p, there exists a p-adic distance which leads to a ring strictly bigger than and containing . This is the ring of p-adic integers. The quotient field of is the field of p-adic numbers. is complete in the sense of convergence of Cauchy sequences (under the p-adic distance), but is not algebraically closed. We know anyway that a (unique) algebraic closure of exists. We have , that is, it was necessary and sufficient to add the imaginary quantity i to to get an algebraically closed field. Unfortunately in the case of the p-adic distance the closure is of infinite extension degree over . In addition, is not complete. An attempt to make complete gives an even bigger field Ωp and the story stops here, Ωp being both algebraically closed and complete. But Ωp is already a pretty huge field and very little is known about it.

In the rest of this section, we, without specific mention, denote by p an arbitrary rational prime.

2.14.1. The Arithmetic of p-adic Numbers

There are various ways in which p-adic integers can be defined. A simple way is to use infinite sequences.

Definition 2.111.

A p-adic integer is defined as an infinite sequence of elements with the property that an+1an (mod pn) for every . Each an, being an element of , can be represented as a (rational) integer unique modulo pn. Thus, if bn, , define another sequence of integers with bnan (mod pn) for every n, the p-adic integers (an) and (bn) are treated the same. In particular, if 0 ≤ bn < pn for every n, then (bn) is called the canonical representation of (an). The set of all p-adic integers is denoted by .[20] A sequence (an) of integers with an+1an (mod pn) for every n is called a p-coherent sequence.

[20] Well! We are now in a mess of notations. We have for every . In particular, for we have which is a field that we planned to denote also by . It is superfluous to have two notations for the same thing. Many authors, therefore, prefer to avoid the hat and call as . For them, our is and/or written explicitly. Let us stick to our old conventions and use hats to remove ambiguities.

See Exercise 2.144 for another way of defining p-adic integers. We now show that is a ring. Before doing that, we mention that the ring is canonically embedded in by the injective map , a ↦ (a).

Definition 2.112.

Let (an) and (bn) be two p-adic integers. Define:

(an) + (bn):=(an + bn).
(an) · (bn):=(an · bn).

One can easily check that these operations are well-defined, that is, independent of the choice of the representatives of an and bn. It also follows easily that these operations make a ring with additive identity and with multiplicative identity . The additive inverse of (an) is –(an) = (–an). Moreover is an injective ring homomorphism . In view of this, one often identifies the rational integer a with the p-adic integer . We will also do so, provided that we do not expect to face a danger of confusion. Also note that for the l-fold sum l(an) is the same as (l)(an) = (lan). Thus in this context the two interpretations of l remain perfectly consistent.

It turns out that is an integral domain. In order to see why, let us focus our attention on the units of . Let us plan to denote (the multiplicative group of units of ) by Up. The next result characterizes elements of Up.

Proposition 2.52.

For , the following conditions are equivalent:

  1. pan for all .

  2. pa1.

Proof

[(a)⇒(b)] Let (an)(bn) = (anbn) = 1 = (1) for some . Then for every we have anbn ≡ 1 (mod pn), that is, an is invertible modulo pn and hence modulo p as well, that is, pan.

[(b)⇒(c)] Obvious.

[(c)⇒(a)] Let us construct a p-coherent sequence bn, , of (rational) integers with anbn ≡ 1 (mod pn). This (bn) would be the desired inverse of (an) in . Since pa1 and ana1 (mod p), it follows that pan as well and, therefore, the congruence anx ≡ 1 (mod pn) has a unique solution modulo pn, namely (mod pn).

We also have an+1bn+1 ≡ 1 (mod pn), that is, anbn+1 ≡ 1 (mod pn), that is, .

Proposition 2.53.

Every can be written uniquely as x = pry for some and for some .

Proof

If pa1, take r := 0 and y := x. So assume that p|a1. Choose such that [an]pn = [0]pn for 1 ≤ nr, whereas [ar+1]pr+1 ≠ [0]pr+1. Such an r exists, since x ≠ 0 by hypothesis. For , we have ar+nar ≡ 0 (mod pr), that is, pr|ar+n, whereas ar+nar+1 ≢ 0 (mod pr+1), that is, pr+1ar+n, that is, vp(ar+n) = r. Define bn := ar+n/pr. Since ar+n+1ar+n (mod pr+n), division by pr gives bn+1bn (mod pn), that is, . Moreover, prbn = ar+nan (mod pn), that is, x = pry. Finally, since pb1, we have . This establishes the existence of a factorization x = pry. The uniqueness of this factorization is left to the reader as an easy exercise.

Proposition 2.54.

is an integral domain.

Proof

Let x1 and x2 be non-zero elements of . By Proposition 2.53, we can write x1 = pr1 y1 and x2 = pr2 y2 with r1, and y1, . Then (an) := x1x2 = pr1+r2 y1y2. Now and hence no bn is divisible by p. Therefore, ar1+r2+1 = pr1+r2 br1+r2+1 ≢ 0 (mod pr1+r2+1), that is, (an) = x1x2 ≠ 0.

Definition 2.113.

The quotient field of is called the field of p-adic numbers.

Proposition 2.55.

Every non-zero can be expressed uniquely as x = pry with and .

Proof

One can write x = a/b for some a, . Then a = psc and b = ptd for some s, , c, and so x = pst(c/d) with . The proof for the uniqueness is left to the reader.

The canonical inclusion naturally extends to the canonical inclusion . We can identify with the rational a/b and say that is contained in . Being a field of characteristic 0, contains an isomorphic copy of . The map gives this isomorphism explicitly. Note that the ring is strictly bigger than and the field is strictly bigger than the field (Exercise 2.147).

2.14.2. The p-adic Valuation

Proposition 2.55 leads to the notion of p-adic distance between pairs of points in . Let us start with some formal definitions.

Definition 2.114.

A metric on a set S is a map such that for every x, y, we have:

  1. Non-negative d(x, y) ≥ 0.

  2. Non-degeneracy d(x, y) = 0 if and only if x = y.

  3. Symmetry d(x, y) = d(y, x).

  4. Triangle inequality d(x, z) ≤ d(x, y) + d(y, z).

A set S together with a metric d is called a metric space (with metric d).

Definition 2.115.

A norm on a field K is a map such that for all x, we have:

  1. Non-negativex‖ ≥ 0.

  2. Non-degeneracyx‖ = 0 if and only if x = 0.

  3. Multiplicativityxy‖ = ‖x‖ ‖y‖.

  4. Triangle inequalityx + y‖ ≤ ‖x‖ + ‖y‖.

It is an easy check that for a norm ‖ ‖ on K the function , d(x, y) := ‖xy‖, defines a metric on K.

A norm ‖ ‖ on a field K is called non-Archimedean (or a finite valuation), if ‖x + y‖ ≤ max(‖x‖, ‖y‖) for all x, (a condition stronger than the triangle inequality). A norm which is not non-Archimedean is called Archimedean (or an infinite valuation).

Example 2.34.
  1. Setting defines a norm on any field K. This norm is called the trivial norm on K.

  2. The absolute value | | is an Archimedean norm on (or ). It is customary to denote this norm as | |. This norm induces the usual metric topology on (or ) which is at the heart of real analysis. In p-adic analysis, one investigates under the p-adic norms that we define now.

Definition 2.116.

The p-adic norm on is defined as:

Theorem 2.59.

The p-adic norm | |p is a non-Archimedean norm on .

Proof

Non-negative-ness, non-degeneracy and multiplicativity of | |p are immediate. For proving the triangle inequality, it is sufficient to prove the non-Archimedean condition. Take x, . If x = 0 or y = 0 or x + y = 0, we clearly have |x + y|p ≤ max(|x|p, |y|p). So assume that each of x, y and x + y is non-zero. Write x = pru and y = psv with r, and u, . Without loss of generality, we may assume that rs. Then, x + y = psz, where . Since x + y ≠ 0, we have z ≠ 0; so we can write z = ptw for some and . But then |x + y|p = p–(s+t)ps = max(pr, ps) = max(|x|p, |y|p).

Definition 2.117.

Two metrics d1 and d2 on a metric space S are called equivalent if a sequence (xn) from S is Cauchy with respect to d1 if and only if it is Cauchy with respect to d2. Two norms on a field are called equivalent if they induce equivalent metrics.

For every , the field is canonically embedded in and thus we have a notion of a p-adic distance on . We also have the usual Archimedean distance | | on . We now state an interesting result without a proof, which asserts that any distance on must be essentially the same as either the usual Archimedean distance or one of the p-adic distances.

Theorem 2.60. Ostrowski’s theorem

Every non-trivial norm on is equivalent to | |p for some .

The notions of sequences and series and their convergences can be readily extended to under the norm | |p. Since the p-adic distance assumes only the discrete values pr, , it is often customary to restrict ourselves only to these values while talking about the convergence criteria of sequences and series, that is, instead of an infinitesimally small real ε > 0 one can talk about an arbitrarily large with pM ≤ ε.

Definition 2.118.

Let x1, x2, . . . be a sequence of elements of . We say that this sequence converges to a limit , if given there exists such that |xnx|ppM for all nN. We write this as x = lim xn or as xnx.

Consider the partial sums for each . If there exists with sns, we say that the sum converges to s and write .

A sequence x1, x2, . . . of elements of is said to be a Cauchy sequence if for every , there exists an such that |xmxn|ppM for all m, nN.

Definition 2.119.

A field K is called complete under a norm ‖ ‖ if every sequence of elements of K, which is Cauchy under ‖ ‖, converges to an element in K.

For example, is complete under | |. We shortly demonstrate that is complete under | |p.

Consider a field K not (necessarily) complete under a norm ‖ ‖. Let C denote the set of all Cauchy sequences from K. Define addition and multiplication in C as (an) + (bn) := (an + bn) and (an)(bn) := (anbn). Under these operations C becomes a commutative ring with identity having a maximal ideal . The field is called the completion of K with respect to the norm ‖ ‖. K is canonically embedded in L via the map . The norm ‖ ‖ on K extends to elements of L as limn→∞an‖. L is a complete field under this extended norm. In fact, it is the smallest field containing K and complete under ‖ ‖.

is the completion of with respect to the Archimedean norm | |. On the other hand, turns out to be the completion of with respect to the p-adic norm | |p. Before proving this let us first prove that itself is a complete field under the p-adic norm. Let us start with a lemma.

Lemma 2.18.

A sequence (an) of p-adic numbers is a Cauchy sequence if and only if the sequence (an+1an) converges to 0.

Proof

[if] Take any . Since an+1an → 0 by hypothesis, there exists such that |an+1an|ppM for all nN. But then for all m, nN with m = n+k, , we have .

Thus (an) is a Cauchy sequence.

[only if] Take any . Since (an) is a Cauchy sequence by hypothesis, there exists such that |aman|ppM for all m, nN. In particular, |an+1an|ppM for all nN, that is, an+1an → 0.

Theorem 2.61.

The field is complete with respect to | |p.

Proof

Let (an) be a Cauchy sequence in . By Lemma 2.18, an+1an → 0. Therefore, there exists such that |an+1an|p ≤ 1 for all nN. For n = N + k, , we have

|an|p=|aN + k|p
 =|(aN + kaN + k –1) + · · · + (aN + 1aN) + aN|p
 max(|aN + kaN + k – 1|p,. . ., |aN + 1aN|p, |aN|p)
 max(1, |aN|p).

It then follows that |an|ppm for all , where satisfies pm = max(1, |a1|p, . . . , |aN |p). If m ≥ 0, then each (Exercise 2.148). Otherwise consider the sequence (pman) which is clearly Cauchy and in which each , since |pman|ppmpm = 1. Thus, without loss of generality, we may assume that the given sequence (an) itself is one of p-adic integers.

Let an = an,0+an,1p+an,2p2+· · · be the p-adic expansion of an (Exercise 2.145). Since (an) is Cauchy, for every there exists such that |aman|pp–(M+1) for all m, nNM: that is, an, i = am, i for 0 ≤ iM, m, nNM. Define xM := an, M for any nNM and . It then follows that anx.

Theorem 2.62.

is the completion of with respect to the norm | |p.

Proof

Let C denote the ring of Cauchy sequences from (under the p-adic norm), the maximal ideal of C consisting of sequences that converge to 0, and . We now show that .

If has the p-adic expansion a = arpr +· · ·+a–1p–1 +a0+a1p+a2p2 + · · · (Exercise 2.145), then αn := arpr + · · · + a–1p–1 + a0 + a1p + · · · + anpn, , define a sequence of elements of . We have |αna|pp–(n+1), that is, αna. Moreover, the sequence (αn) of rational numbers is Cauchy with respect to | |p, since for every we have |αm – αn|pp–(M+1) for all m, nM. Thus , , is a well-defined field homomorphism. Being a field homomorphism is injective.

What remains is to show that the map is surjective. Take any . Since (βn) is a Cauchy sequence, by Theorem 2.61 it converges to a point . We construct the sequence (αn) corresponding to a as described in the last paragraph. Then αna as well and hence using the triangle inequality (or the non-Archimedean condition) we have αn – βn = (αna) – (βna) → 0, that is, , that is, .

Corollary 2.29.

The p-adic series (with ) converges if and only if |an|p → 0.

Proof

The only if part is obvious. For the if part, take a sequence (an) of p-adic numbers with |an|p → 0. Define . Since an+1 = sn+1sn → 0 by hypothesis, Lemma 2.18 guarantees that (sn) is a Cauchy sequence, that is, (sn) converges in .

This is quite unlike the Archimedean norm | |. For example, with respect to this norm , whereas the series diverges.

2.14.3. Hensel’s Lemma

Let us conclude our short study of p-adic methods by proving an important theorem due to Hensel. This theorem talks about the solvability of polynomial equations f(X) = 0 for . Before proceeding further, let us introduce a notation. Recall that every has a unique p-adic expansion of the form a = a0 + a1p + a2p2 + · · · with 0 ≤ an < p (Exercises 2.144 and 2.145). If a0 = a1 = · · · = an–1 = 0, then a = anpn + an+1pn+1 + an +2pn+2 + · · · = pnb, where . Thus pn|a in . We denote this by saying that a ≡ 0 (mod pn). Notice that a ≡ 0 (mod pn) if and only if |a|ppn. We write ab (mod pn) for a, , if ab ≡ 0 (mod pn). Since pn can be viewed as the element of , this congruence notation conforms to that for a general PID. ( is a PID by Exercise 2.148.)

Since by our assumption any ring A comes with identity (that we denote by 1 = 1A), it makes sense to talk for every about an element n = nA in A, which is the n-fold sum of 1. More precisely:

Given any , one can define the formal derivative of f as . Properties of formal derivatives of polynomials are covered in Exercise 2.61.

Theorem 2.63. Hensel’s lemma

Let . Suppose that there exist and satisfying:

  1. |f0|p–(2M + 1) (that is, α0 is a solution of f(x)≡ 0 (mod p2M+1)), and

  2. |f′(α0)|p = pM (this is, f′ (α0) ≢ 0 (mod pM+1)).

Then there exists a unique such that f(α) = 0 and |α – α0|pp–(M+1) (that is, α ≡ α0 (mod pM+1)).

Proof

Let us inductively construct a sequence α0, α1, α2, · · · of p-adic integers with the properties that |fn)|pp–(2M+n+1) and |f′(αn)|p = pM for every . The given α0 provides the starting point (induction basis). For the inductive step, assume that n ≥ 1 and that α0, α1, . . . , αn–1 have been constructed with the desired properties. we now explain how to construct αn from αn–1. Put

αn := αn–1 + knpM + nfor some .

We want to find a suitable kn for which |fn)|pp–(2M+n+1). Taylor expansion gives fn) = fn–1) + knpM+nf′(αn–1) + cnp2(M+n) for some . Since by induction hypothesis p2M+n |fn–1) and pM |f′(αn – 1), we can write

Since pM+1f′(αn–1), the element and, therefore, there is a unique solution for kn of the congruence

This value of kn yields

fn) = p2M + n(bnp + cnpn) ≡ 0 (mod p2M+n+1)

for some . The Taylor expansion of f′ gives f′(αn) = f′(αn–1) + dnpM+n (for some ) which implies that f′(αn) ≡ f′(αn–1) (mod pM), that is, |f′(αn)|p = pM.

Since |αn – αn–1|pp–(M+n), it follows that αn – αn–1 → 0, that is, (αn) is a Cauchy sequence (under | |p). By the completeness of , we then have an such that αn → α. Similarly fn) – fn–1) → 0, that is, the sequence (fn)) is Cauchy and hence converges to f(α). Also |fn)|pp–(2M+n+1), that is, fn) → 0, that is, f(α) = 0. Finally, each αn ≡ α0 (mod pM+1), so that α ≡ α0 (mod pM+1). This establishes the existence of a desired .

For proving the uniqueness of α, let satisfy f(β) = 0 and |β – α0|pp–(M+1). By Taylor expansion, f(β) = f(α) + (β – α)f′(α) + (β – α)2c for some , that is, (β – α)(f′(α) + (β – α)c) = 0. Now β – α = (β – α0) – (α – α0) and so |β – α|p ≤ max(|β – α0|p, |α – α0|p) ≤ p–(M+1), whereas f′(αn) → f′(α), so that |f′(α)|p = pM. Therefore, f′(α)+(β –α)c ≢ 0 (mod pM+1) and, in particular, f′(α) + (β – α)c ≠ 0. Thus we must have β – α = 0.

Note that αn in the last proof satisfies the congruence

fn) ≡ 0 (mod p2M+n+1)

for each . We are given the solution α0 corresponding to n = 0. From this, we inductively construct the solutions α1, α2, . . . corresponding to n = 1, 2, . . . respectively. The process for computing αn from αn–1 as described in the proof of Hensel’s lemma is referred to as Hensel lifting. The given conditions ensure that this lifting is possible (and uniquely doable) for every , and in the limit n → ∞ we get a root of f. Since each kn is required modulo p, we can take . So α admits a p-adic expansion of the form α = α0 + k1pM+1 + k2pM+2 + k3pM+3 + · · ·.

The special case M = 0 for Hensel’s lemma is now singled out:

Corollary 2.30.

Let . Suppose that there exists an satisfying:

  1. |f0)|p < 1 (that is, α0 is a solution of f(x) ≡ 0 (mod p)), and

  2. |f′(α0)|p = 1 (that is, f′(α0) ≢ 0 (mod p), that is, α0 is a simple root of f modulo p).

Then there exists a unique such that f(α) = 0 and |α – α0|p < 1 (that is, α ≡ α0 (mod p)).

For this special case, we compute solutions αn of f(x) ≡ 0 (mod pn+1) inductively for n = 1, 2, 3, . . . , given a suitable solution α0 of this congruence for n = 0. The lifting formula is now:

Equation 2.21


Example 2.35.

is canonically embedded in and so is in . Thus it makes sense to carry out the lifting process for a polynomial and for some solution α0 of f(X) ≡ 0 (mod p) in . One solves Formula (2.21) in and obtains each . The limit α belongs to and is a solution of f(X) = 0 in .

For example, let p be an odd prime and . Let be a solution of X2a (mod p). Here f(X) = X2a, so that f′(X) = 2X, that is, f′(α0) = 2α0 ≢ 0 (mod p). Thus the conditions of Corollary 2.30 are satisfied and we get a unique square root of α in with α ≡ α0 (mod p). This α has a p-adic expansion of the form α = α0 + k1p + k2p2 + k3p3 + · · ·.

As a specific numerical example, take p = 7, a = 2 and α0 = 3. Using Formula (2.21), we compute k1 = 1, α1 = 10, k2 = 2, α2 = 108, k3 = 6, α3 = 2166, and so on. Thus a square root of 2 in is 3 + 1 × 7 + 2 × 72 + 6 × 73 + · · ·. The other square root of α in can be obtained by starting with α0 = 4.

Exercise Set 2.14

2.144
  1. Establish that any p-adic integer (an) can be uniquely described as a sequence of integers xn satisfying 0 ≤ xn < p for every and anx0 + x1p + · · · + xn–1pn–1 (mod pn) for every . In this case, the p-adic integer (an) is written as the infinite series

    (an) = x0 + x1p + x2p2 + · · ·.

    One calls the above series the p-adic expansion of (an). Note that the sum in the above series is not to be treated as one of integers. However, for the expansion of a to the base p is the same as the p-adic expansion of a (more correctly of ). In other words, if the p-adic expansion of (an) is terminating, that is, xN = xN+1 = xN+2 = · · · = 0 for some N, then (an) can be identified with the rational integer x0 + x1p + · · · + xN–1pN–1. A non-terminating p-adic series, on the other hand, diverges under the Archimedean norm, but converges under the p-adic norm and corresponds to an element of not in . The rational integer –1, for example, has the infinite p-adic expansion (p – 1) + (p – 1)p + (p – 1)p2 + · · ·. The sum telescopes and in the limit n → ∞ converges (under the p-adic norm) to limn→∞ pn – 1 = –1.

  2. Let . Write the p-adic expansion for –a. [H]

  3. Given p-adic integers a := x0 + x1p + x2p2 + · · · and b := y0 + y1p + y2p2 + · · · , find the p-adic integers c := z0 + z1p + z2p2 + · · · and d := w0 + w1p + w2p2 + · · · , such that c = a + b and d = ab. (Express each zn and wn explicitly in terms of xn’s and yn’s.)

2.145In view of Exercise 2.144, every admits a unique expansion of the form x = x0 + x1p + x2p2 + · · · , where each . This notion of p-adic expansion can be extended to the elements of .
  1. Show that for , there exist unique and unique integers xr, xr+1, . . . , x–1, x0, x1, . . . , each in {0, 1, . . . , p – 1}, such that x can be written as:

    x = xrpr + xr+1pr+1 + · · · + x–1p–1 + x0 + x1p + x2p2 + · · ·.

  2. Describe how to compute the p-adic expansions of x + y and xy given those for x, . Also of x/y provided that y ≠ 0.

  3. What is |x|p for ?

  4. What is|x|p for with xr ≠ 0.

2.146Let p be an odd prime and with . From elementary number theory we know that the congruence x2a (mod pn) has two solutions for every . Let x1 be a solution of x2a (mod p). We know that a solution xn of x2a (mod pn) lifts uniquely to a solution xn+1 of x2a (mod pn+1). Thus we can inductively compute a sequence x1, x2, x3, · · · of integers. Show that (xn) is a p-adic integer and that (xn)2 = (a).
2.147
  1. Show that the ring contains rationals of the form a/b, a, , pb. This implies that .

  2. Take a := 17 for p = 2, a := 7 for p = 3 and a := p + 1 for p > 3. Show that there exists with x2 = a in . Show also that such an x does not belong to . Thus .

  3. Show that . Thus .

2.148Prove the following assertions:
  1. .

  2. .

  3. Every non-zero ideal of is of the form for some .

  4. The ideals of Part (c) satisfy the infinite strictly descending chain .

  5. is a local domain with the maximal ideal .

  6. The ideal of Part (c) is the principal ideal of generated by pr, and . In particular, is a local PID, that is, a discrete valuation domain (Exercise 2.133), with the residue field .

2.149Compute the p-adic expansion of 1/3 in and of –2/5 in .
2.150Show that is dense in under the p-adic norm | |p, that is, show that given any and real ε > 0, there exists with |xa|p < ε. Show also that is dense in .
2.151Prove the following assertions that establish that is the closure of in under | |p.
  1. Every sequence (an) of rational integers, Cauchy under | |p, converges in .

  2. If a sequence (an) of rational numbers, Cauchy under | |p, converges to , then there exists a sequence (bn) of rational integers, Cauchy under | |p, that converges to x.

2.152Show that:
  1. The series converges in .

  2. The series converges in .

  3. in . [H]

  4. The series does not converge in .

  5. If and |a|p < 1, then .

2.153Prove that for any non-zero . [H]
2.154Prove that for any the sequence (apn) converges in . [H]
2.155Let p, , pq. Show that the fields and are not isomorphic.
2.156Let a be an integer congruent to 1 modulo 8. Show that there exists an such that α2 = a and .
2.157Compute with α2 + α + 223 = 0 and α ≡ 4 (mod 243).
2.158Let p be an odd prime and . Show that the polynomial X2a has exactly root in .
2.159Show that the polynomial X2p is irreducible in .
2.160

Teichmüller representative Let . Show that there exists a unique such that αp = α and α ≡ a (mod p).

2.161Show that the algebraic closure of is of infinite extension degree over . [H]

2.15. Statistical Methods

Many attacks on cryptosystems involve statistical analysis of ciphertexts and also of data collected from the victim’s machine during one or more private-key operations. For a proper understanding of these analysis techniques, one requires some knowledge of statistics and random variables. In this section, we provide a quick overview of some statistical gadgets. We make the assumption that the reader is already familiar with the elementary notion of probability. We denote the probability of an event E by Pr(E).

2.15.1. Random Variables and Their Probability Distributions

An experiment whose outcome is random is referred to as a random experiment. The set of all possible outcomes of a random experiment is called the sample space of the experiment. For example, the outcomes of tossing a coin can be mapped to the set {H, T} with H and T standing respectively for head and tail. It is convenient to assign numerical values to the outcomes of a random experiment. Identifying head with 0 and tail with 1, one can view coin tossing as a random experiment with sample space {0, 1}. Some other random experiments include throwing a die (with sample space {1, 2, 3, 4, 5, 6}), the life of an electric bulb (with sample space , the set of all non-negative real numbers), and so on. Unless otherwise specified, we henceforth assume that sample spaces are subsets of .

A random variable is a variable which can assume (all and only) the values from a (given) sample space.

A discrete random variable can assume only countably many values, that is, the sample space SX of a discrete random variable X either is finite or has a bijection with , that is, we can enumerate the elements of SX as x1, x2, x3, . . ..

The probability distribution function or the probability mass function

fX : SX → [0, 1]

of a discrete random variable X assigns to each x in the sample space SX of X the probability of the occurrence of the value x in a random experiment.[21] We have

[21] [a, b] is the closed interval consisting of all real numbers u satisfying aub. Similarly, the open interval (a, b) is the set of all real values u satisfying a < u < b. In order to make a distinction between the open interval (a, b) and the ordered pair (a, b), many—mostly Europeans—use the notation ]a, b[ for denoting open intervals.

A continuous random variable assumes uncountable number of values, that is, the sample space SX of a continuous random variable X cannot be in bijective correspondence with a subset of . Typically SX is an interval [a, b] or (a, b) with –∞ ≤ a < b ≤ +∞.

One does not assign individual probabilities Pr(X = x) to a value assumed by a continuous random variable X.[22] The probabilistic behaviour of X is in this case described by the probability density function

[22] More correctly, Pr(X = x) = 0 for each .

with the implication that the probability that X occurs in the interval [c, d] (or (c, d)) is given by the integral

that is, by the area between the x-axis, the curve fX(x) and the vertical lines x = c and x = d. We have

It is sometimes useful to set fX(x) :=0 for , so that fX is defined on the entire real line .

The cumulative probability distribution of a random variable X (discrete or continuous) is the function FX (x) := Pr(Xx) for all . If X is continuous, we have

which implies that

2.15.2. Operations on Random Variables

Let X and Y be discrete random variables. The joint probability distribution of X, Y refers to a random variable Z with SZ = SX × SY. For z = (x, y), the probability of Z = z is denoted by fZ(z) = Pr(Z = z) = Pr(X = x, Y = y). The probability Pr(X = x, Y = y) stands for the probability that X = x and Y = y. The random variables X and Y are called independent, if

Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y)

for all x, y.

Example 2.36.

Suppose that we have an urn containing three identical balls with labels 1, 2, 3. We draw two balls randomly from the urn. Let us denote the outcome of the first drawing by X and that of the second drawing by Y. We consider the joint distribution X, Y of the two outcomes in the two following cases:

  1. The balls are drawn with replacement, that is, after the first ball is drawn, it is returned back to the urn (and the urn is shaken well), before the next ball is drawn. The joint probability distribution is now as follows:

    In this case, the outcome of the second drawing is not influenced by the outcome of the first drawing; that is, X and Y are independent, and we have , as expected.

  2. The balls are drawn without replacement, that is, the ball obtained by the first drawing is not returned to the urn, before the second ball is drawn. In this case, the outcome of the second drawing is influenced by that of the first drawing in the sense that the same ball cannot be drawn on both occasions. Thus, X and Y are now dependent. This is revealed by the following joint probability distribution:

    xyPr(X = x, Y = y)
    110
    121/6
    131/6
    211/6
    220
    231/6
    311/6
    321/6
    330

For continuous random variables X and Y, the joint distribution is defined by the probability density function fX,Y (x, y) and the cumulative distribution is obtained by the double integral

X and Y are independent, if fX,Y (x, y) = fX(x)fY (y) for all x, y. In this case, we also have FX,Y (c, d) = FX(c)FY (d) for all c, d.

Now, we define arithmetic operations on random variables. First, let X and Y be discrete random variables. The sum X + Y is defined to be a random variable U which assumes the values u = x + y for and with probability

The product XY of X and Y is defined to be a random variable V which assumes the values v = xy for and with probability

For , the random variable W = αX assumes the values w = αx for with probability

fW(w) = Pr(W = αx) = Pr(X = x) = fX(x).

Example 2.37.

Let us consider the random variables X and Y of Example 2.36. For the sake of brevity, we denote Pr(X = x, Y = y) by Pxy. The distributions of U = X + Y in the two cases are as follows:

  1. Drawing with replacement:

    Pr(U = 2)=P11 = 1/9
    Pr(U = 3)=P12 +P21 = 2/9
    Pr(U = 4)=P13 +P22 + P31 = 1/3
    Pr(U = 5)=P23 +P32 = 2/9
    Pr(U = 6)=P33 = 1/9

  2. Drawing without replacement:

    Pr(U = 3)=P12 +P21= 1/3
    Pr(U = 4)=P13 +P31= 1/3
    Pr(U = 5)=P23 +P32= 1/3

Now, let us consider continuous random variables X and Y. In this case, it is easier to define first the cumulative density functions of U = X + Y, V = XY and W = αX and then the probability density functions by taking derivatives:

One can easily generalize sums and products to an arbitrary finite number of random variables. More generally, if X1, . . . , Xn are random variables and , one can talk about the probability distribution or density function of the random variable g(X1, . . . , Xn). (See Exercise 2.163.)

Now, we introduce the important concept of conditional probability. Let X and Y be two random variables. To start with, suppose that they are discrete. We denote by f(x, y) = Pr(X = x, Y = y) the joint probability distribution function of X, Y. For with Pr(Y = y) > 0, we define the conditional probability of X = x given Y = y as:

For a fixed , the probabilities fX|y(x), , constitute the probability distribution function of the random variable X|y (X given Y = y). If X and Y are independent, f(x, y) = fX(x)fY (y) and so fX|y(x) = fX(x) for all , that is, the random variables X and X|y have the same probability distribution. This is expected, because in this case the probability of X = x does not depend on whatever value y the variable Y takes.

If X and Y are continuous random variables with joint density f(x, y) and , the conditional probability density function of X|y (X given Y = y) is defined by

Again if X and Y are independent, we have fX|y(x) = fX(x) for all x, y.

For a fixed , one can likewise define the conditional probabilities fY|x (y) := f(x, y)/fX (x) for all .

Let X and Y be discrete random variables with joint distribution f(x, y). Also let Γ ⊆ SX and Δ ⊆ SY. One defines the probability fX(Γ) as:

The joint probability f(Γ, Δ), is defined as:

If Γ = {x} is a singleton, we prefer to write f(x, Δ) instead of f({x}, Δ). Similarly, f(Γ, y) stands for f (Γ,{y}). We also define the conditional distributions:

We abbreviate fX (Γ) as Pr(Γ|Δ) and fY (Δ) as Pr(Δ|Γ).

Theorem 2.64. Bayes rule

Let X, Y be discrete random variables and Δ ⊆ SY with fY (Δ) > 0. Also let Γ1,..., Γn form a partition of SX with fXi) > 0 for all i = 1, . . . , n. Then we have:

that is, in terms of probability:

Proof

Pr(Γi, Δ) = Pr(Δ|Γi) Pr(Γi) = Pr(Γi|Δ) Pr(Δ). So it is sufficient to show that Pr(Δ) equals the sum in the denominator. The event Δ is the union of the pairwise disjoint events (Γj, Δ), j = 1,..., n, and so .

The Bayes rule relates the a priori probabilities Pr(Γj) and Pr(Δ|Γj) to the a posteriori probabilities Pr(Γi|Δ). The following example demonstrates this terminology.

Example 2.38.

Consider the random experiment of Example 2.36(2). Take Γj := {j} for and Δ := {2, 3}. We have the following a priori probabilities:

Pr(Γj)=Probability of getting ball j in the first draw = 1/3,
Pr(Δ|Γ1)=Probability of getting the second or the third ball in the second draw, given that the first ball is obtained in the first draw = 1,
Pr(Δ|Γ2)=Probability of getting the second or the third ball in the second draw, given that the second ball is obtained in the first draw = 1/2,
Pr(Δ|Γ3)=Probability of getting the second or the third ball in the second draw, given that the third ball is obtained in the first draw = 1/2.

The a posteriori probability Pr(Γ1|Δ) that the first ball was obtained in the first draw given that the ball obtained in the second draw is the second or the third one is calculated using the Bayes rule as:

One can similarly calculate . This is expected, since the only events (x, y) consistent with are the four equiprobable possibilities (1, 2), (1, 3), (2, 3) and (3, 2).

2.15.3. Expectation, Variance and Correlation

Let X be a random variable. The expectation E(X) of X is defined as follows:

E(X) is also called the (arithmetic) mean or average of X. One uses the alternative symbols μX and to denote E(X). More generally, let X1, . . . , Xn be n random variables with joint probability distribution/density function f(x1, . . . , xn). Also let . We define the following expectations:

X is discrete:

X is continuous:

Let g(X) and h(Y) be real polynomial functions of the random variables X and Y and let . Then

E(g(X) + h(Y))=E(g(X)) + E(h(Y)),
E(g(X)h(Y))=E(g(X)) E(h(Y)) if X and Y are independent,
E(αg(X))=αE(g(X)).

Let us derive the sum and product formulas for discrete variables X and Y.

If X and Y are independent, then

The variance Var(X) of a random variable X is defined as

Var (X) := E[(X – E(X))2].

From the observation that E[(X – E(X))2] = E[X2 – 2 E(X)X + [E(X)]2] = E(X2) – 2 E(X) E(X) + [E(X)]2, we derive the computational formula:

Var (X) = E[X2] – [E(X)]2.

Var(X) is a measure of how the values of X are dispersed about the mean E(X) and is always a non-negative quantity. The (non-negative) square root of Var(X) is called the standard deviation σX of X:

The following formulas can be easily verified:

Var(X + α)=Var(X).
Var(αX)=α2 Var(X).
Var(X + Y)=Var(X) + Var(Y) + 2 Cov(X, Y),

where and where the covariance Cov(X, Y) of X and Y is defined as:

Cov(X, Y) := E[(X – E(X))(Y – E(Y))] = E(XY) – E(X) E(Y).

Normalized covariance is a measure of correlation between the two random variables X and Y. More precisely, the correlation coefficient ρX,Y is defined as:

If X and Y are independent, E(XY) = E(X) E(Y) so that Cov(X, Y) = 0 and so ρX,Y = 0. The converse of this is, however, not true, that is, ρX,Y = 0 does not necessarily imply that X and Y are independent. ρX,Y is a real value in the interval [–1, 1] and is a measure of linear relationship between X and Y. If larger (resp. smaller) values of X are (in general) associated with larger (resp. smaller) values of Y, then ρX,Y is positive. On the other hand, if larger (resp. smaller) values of X are (in general) associated with smaller (resp. larger) values of Y, then ρX,Y is negative.

Example 2.39.

Once again consider the drawing of two balls from an urn containing three balls labelled {1, 2, 3} (Examples 2.36, 2.37 and 2.38). Look at the second case (drawing without replacement). We use the shorthand notation Pxy for Pr(X = x, Y = y). The individual probability distributions of X and Y can be obtained from the joint distribution as follows:

Pr(X = 1)= P11 + P12 + P13= 0 + (1/6) + (1/6)= 1/3
Pr(X = 2)= P21 + P22 + P23= (1/6) + 0 + (1/6)= 1/3
Pr(X = 3)= P31 + P32 + P33= (1/6) + (1/6) + 0= 1/3
    
Pr(Y = 1)= P11 + P21 + P31= 0 + (1/6) + (1/6)= 1/3
Pr(Y = 2)= P12 + P22 + P32= (1/6) + 0 + (1/6)= 1/3
Pr(Y = 3)= P13 + P23 + P33= (1/6) + (1/6) + 0= 1/3

Thus E(X) = 1 × (1/3) + 2 × (1/3) + 3 × (1/3) = 2. Similarly, E(Y) = 2. Therefore, E(X + Y) = E(X) + E(Y) = 4. This can also be verified by direct calculations: E(X + Y) = 3 × (1/3) + 4 × (1/3) + 5 × (1/3) = 4.

E(X2) = E(Y2) = 12 × (1/3) + 22 × (1/3) + 32 × (1/3) = 14/3 and Var(X) = Var(Y) = (14/3) – 22 = 2/3. The probability distribution for XY is

E(XY = 2)=P12 + P21 = 1/3
E(XY = 3)=P13 + P31 = 1/3
E(XY = 6)=P23 + P32 = 1/3,

so that E(XY) = 2 × (1/3) + 3 × (1/3) + 6 × (1/3) = 11/3. Therefore, Cov(XY) = E(XY) – E(X) E(Y) = (11/3) – 2 × 2 = –1/3, that is,

The negative correlation between X and Y is expected. If X = 1 (small), Y takes bigger values (2, 3). On the other hand, if X = 3 (large), Y assumes smaller values (1, 2). Of course, the correlation is not perfect, since for X = 2 the values of Y can be smaller (1) or larger (3). So, we should feel happy to see a not-so-negative correlation of –1/2 between X and Y.

2.15.4. Some Famous Probability Distributions

Some probability distributions that occur frequently in statistical theory and in practice are described now. Some other useful probability distributions are considered in the Exercises 2.169, 2.170 and 2.171.

Uniform distribution

A discrete uniform random variable U has sample space SU := {x1, . . . , xn} and probability distribution

A continuous uniform random variable U has sample space SU and probability density function

where A > 0 is the size[23] of SU. For example, if SU is the real interval [a, b] for a < b, we have

[23] If , “size” means length. If or , “size” refers to area or volume respectively. We assume that the size of SU is “measurable”.

In this case, we have

E(U) = (a + b)/2andVar(U) = (ba)2/12.

Uniform random variables often occur naturally. For example, if we throw an unbiased die, the six possible outcomes (1 through 6) are equally likely, that is, each possible outcome has the probability 1/6. Similarly, if a real number is chosen randomly in the interval [0, 1], we have a continuous uniform random variable. The built-in C library call rand() (pretends to) return an integer between 0 and 231 – 1, each with equal probability (namely, 2–31).

Bernoulli distribution

The Bernoulli random variable B = B(n, p) is a discrete random variable characterized by two parameters and , where p stands for the probability of a certain event E and n represents the number of (independent) trials. It is assumed that the probability of E remains constant (namely, p) in each of the n trials. The sample space SB = {0, 1, . . . , n} comprises the (exact) numbers of occurrences of E in the n trials. B has the probability distribution

as follows from simple combinatorial arguments. The mean and variance of B are:

E(B) = npandVar(B) = np(1 – p).

The Bernoulli distribution is also called the binomial distribution.

Normal distribution

The normal random variable or the Gaussian random variable N = N (μ, σ2) is a continuous random variable characterized by two real parameters μ and σ with σ > 0. The density function of N is

The cumulative distribution for N can be expressed in terms of the error function erf():

The error function does not have a known closed-form expression. Figure 2.3 shows the curves for fN (x) and FN (x) for the parameter values μ = 0 and σ = 1 (in this case, N is called the standard normal variable).

Figure 2.3. Standard normal distribution


Some statistical properties of N are:

E(N) = μandVar(N) = σ2.

The curve fN (x) is symmetric about x = μ. Most of the area under the curve is concentrated in the region μ – 3σ ≤ x ≤ μ + 3σ. More precisely:

Pr(μ – σ ≤ X ≤ μ + σ)0.68,
Pr(μ – 2σ ≤ X ≤ μ + 2σ)0.95,
Pr(μ – 3σ ≤ X ≤ μ + 3σ)0.997.

Many distributions occurring in practice (and in nature) approximately follow normal distributions. For example, the height of (adult) people in a given community is roughly normally distributed. Of course, the height of a person cannot be negative, whereas a normal random variable may assume negative values. But, in practice, the probability that such an approximating normal variable assumes a negative value is typically negligibly low.

2.15.5. Sample Mean, Variation and Correlation

In practice, we often do not know a priori the probability distribution or density function of a random variable X. In some cases, we do not have the complete data, whereas in some other cases we need an infinite amount of data to obtain the actual probability distribution of a random variable. For example, let X represent the life of an electric bulb manufactured by a given company in the last ten years. Even though there are only finitely many such bulbs and even if we assume that it is possible to trace the working of every such bulb, we have to wait until all these bulbs burn out, before we know the actual distribution of X. That is certainly impractical. On the contrary, if we have data on the life-times of some sample bulbs, we can approximate the properties of X by those of the samples.

Suppose that S := (x1, x2, . . . , xn) is a sample of size n. We assume that all xi are real numbers. We define the following quantities for S:

Here is the mean of the collection .

If T := (y1, y2, . . . , ym) is another sample (of real numbers), the (linear) relationship between S and T is measured by the following quantities:

Here is the mean of the collection ST := (xiyj | i = 1, . . . , n, j = 1, . . . , m).

An important property of the normal distribution is the following:

Theorem 2.65. Central limit theorem

Let X be any random variable with mean μ and variance σ2 and let . The mean of a random sample S of size n chosen according to the distribution of X approximately follows the normal distribution N (μ, σ2/n). The larger the sample size n is, the better this approximation is.

Exercise Set 2.15

2.162An urn contains n1 red balls and n2 black balls. We draw k balls sequentially and randomly from the urn, where 1 ≤ kn1 + n2.
  1. If the balls are drawn with replacement, what is the probability that the k-th ball drawn from the urn is red?

  2. If the balls are drawn without replacement, what is the probability that the k-th ball drawn from the urn is red?

2.163Let X and Y be the random variables of Example 2.36. For each of the two cases, calculate the probability distribution functions, expectations and variances of the following random variables:
  1. XY

  2. 2X + 3Y

  3. X2

  4. X2 + 2XY + Y2

  5. (X + Y)2

2.164Let X and Y be continuous random variables, g(X) and h(Y) non-constant real polynomials and α, β, . Prove that:
E(g(X) + h(Y))=E(g(X)) + E(h(Y)).
E(g(X)h(Y))=E(g(X)) E(h(Y)), if X and Y are independent.
E(αg(X))=αE(g(X)).
Var(αX + βY + γ)=α2 Var(X) + β2 Var(Y).

2.165Let X be a random variable and Y := αX + β for some α, . What is ρX,Y ?
2.166
  1. Let X and Y be discrete random variables with joint probability distribution function f(x, y). Show that the probability distributions of X and Y can be obtained as

  2. If X and Y are continuous random variables with joint density function f(x, y), show that the density functions of X and Y are given by

    The functions fX and fY are called the marginal probability distribution (or density function) of X and Y respectively.

2.167Let X and Y be continuous random variables whose joint distribution is the uniform distribution in the triangle 0 ≤ XY ≤ 1.
  1. Compute the marginal distributions fX and fY.

  2. Compute E(X), E(Y), Var(X), Var(Y), Cov(X, Y) and ρX,Y.

2.168Let X, Y, Z be random variables. Show that:
Cov(X, Y)=Cov(Y, X).
ρX,Y=ρY,X.
Cov(X, X)=Var(X).
Cov(X, Y + Z)=Cov(X, Y) + Cov(X, Z).
Cov(X, X + Y)=Var(X) + Cov(X, Y).
Cov(X, X + Y)=Var(X) if X and Y are independent.

2.169

Geometric distribution Assume that in each trial of an experiment, an event E has a constant probability p of occurrence. Let G = G(p) denote the random variable with and with fG(x) equal to the probability that E occurs the first time during the x-th trial (that is, after exactly x – 1 failures). Show that:

What if p = 0?
2.170

Poisson distribution Let P = P (λ) be the discrete random variable with and with , where λ is a positive real constant. Show that E(P) = Var(P) = λ.

2.171Exponential distribution
  1. Let X = X(λ) be the continuous random variable with density

    where λ is a positive real constant. Show that:

  2. A random variable Y with is said to be memoryless, if

    Pr(Y > s + t | Y > s) = Pr(Y > t) for all s, .

Show that the exponential variable X of Part (a) is memoryless.

2.172

The birthday paradox Let S be a finite set of cardinality n.

  1. Show that the probability that k < n elements, drawn at random form S (with replacement), are (pairwise) distinct is

  2. Use the inequality 1 – xex for any real number x to show that .

  3. Deduce that p ≤ 1/2, if , and that p ≤ 0.136 for .

    (The birthday paradox states that if only 23 people are chosen at random, there is a chance as high as 50 per cent that at least two of them have the same birthday.)

Chapter Summary

This chapter provides the foundations of public-key cryptology. The long compilation of mathematical concepts presented in the chapter would be indispensable for understanding the topics that follow in the next chapters.

This chapter begins with the basic concepts of sets, functions and relations. We also present the fundamental axioms of mathematics. Although the curricula of plus-two courses of many examination boards do include these topics, we planned to have a discussion on them in order to make our treatment self-sufficient.

Next comes a study of groups which are sets with binary operations satisfying some nice properties (associativity, identity, inverse and optionally commutativity). Groups are extremely important for cryptology. In particular, all discrete-log-based cryptosystems use suitable groups. Subgroups, cosets and formation of quotient groups constitute a prototypical feature that illustrates the basic paradigm of modern algebra. Secure cryptographic algorithms on groups rely on the availability of elements of large orders: for example, generators of big cyclic groups. We study these topics at length. Finally, we present Sylow’s theorem. For us, this theorem has only theoretical significance; it is used for proving some other theorems.

A set with a single operation (like a group) is often too restrictive. Many mathematical structures we are familiar with (like integers, polynomials) are endowed with two basic operations addition and multiplication. A set with two such (compatible) operations is called a ring. A study of rings, fields, ideals and quotient rings is essential in algebra (and so in cryptography too). Three important types of rings, namely unique factorization domains, principal ideal domains and Euclidean domains, are also discussed. Euclidean division is an important property of integers and polynomials, and is useful from a computational perspective.

Then, as a specific example, we study the properties of , the ring of integers. We concentrate mostly on elementary properties of integers like divisibility, congruence, Chinese remainder theorem, Fermat’s and Euler’s theorems, quadratic residues and the law of quadratic reciprocity. We finally discuss some assorted topics from analytic number theory. In cryptography, we require many big randomly generated primes. The prime number theorem guarantees that there is essentially an abundant source of primes. Smooth integers (that is, integers having only small prime divisors) are useful for modern algorithms that compute factorization and discrete logarithms. We present an estimate on the density of smooth integers. The last topic we study is the Riemann hypothesis and its generalizations. This yet unproven hypothesis has a bearing on the running times of many number-theoretic algorithms relevant to cryptology.

The next example is the ring of polynomials over a ring. Polynomials over a field admit Euclidean division and consequently unique factorization. Irreducible polynomials are useful for constructing field extensions. Extension fields of characteristic 2 are quite frequently used in cryptographic systems.

We subsequently study the theory of vector spaces. Linear transformations are appropriate maps between vector spaces and necessitate the theory of matrices. Matrix algebra is widely useful in cryptology as it is in any other branch of algorithmic computer science. Algorithms to solve linear systems over rings and fields constitute a basic computational tool. A study of modules and algebras at the end of this section is mostly theoretical and can be avoided if the reader is willing to accept some theorems without proofs.

In the next section, we discuss the theory of field extensions. As mentioned earlier, cryptography relies heavily on extension fields of characteristic 2. Some related topics include splitting fields and algebraic closure of fields. At the end of this section, we have a short theoretical treatment of Galois theory.

Many popular cryptosystems are based on the multiplicative groups of finite fields. We study these fields as the next topic. Polynomials over finite fields are extremely useful for the construction and representation of finite fields. At the end of this section, we discuss several ways in which (elements of) finite fields can be represented in a computer’s memory. This study expedites the design, analysis and efficient implementation of finite-field arithmetic.

Elliptic- and hyperelliptic-curve cryptography having gained popularity in recent years, one needs to study the theory of plane algebraic curves. This is what we do in the next three sections. To start with, we define affine and projective spaces and curves. Going from the affine space to the projective space is necessitated by a systematic (algebraic) inclusion of points at infinity on a plane curve. We also discuss the theory of divisors and the Jacobian on plane curves. For elliptic curves, the Jacobian can be replaced by the equivalent group described in terms of the chord and tangent rule. For hyperelliptic curves, on the other hand, we have little option other than understanding the Jacobian itself.

Two kinds of elliptic curves that must be avoided in cryptography are supersingular curves and anomalous curves. The elliptic curve group (over a finite field) is the basic set used in elliptic curve cryptosystems. The orders (cardinality) of these groups are given by Hasse’s theorem. The structure theorem establishes that an elliptic curve group (over a finite field) is not necessarily cyclic, but has a rank of at most two.

We then study Jacobians of hyperelliptic curves over finite fields. This study supplements the theory of divisors on general curves. Reduced and semi-reduced divisors are expedient for the representation of the elements in the Jacobian of a hyperelliptic curve.

Many popular cryptosystems (including RSA) derive their security (presumably) from the intractability of the integer factorization problem. The best algorithm known till date for factoring integers is the number-field sieve method. An understanding of this algorithm requires the knowledge of number fields and number rings. We devote a section to the study of these mathematical objects. We start with some necessary commutative algebra including localization, integral dependence and Noetherian rings. Next, we deal with Dedekind domains. All number rings are Dedekind domains in which ideals admit unique factorization. We also discuss the factorization of ideals in number rings generated by rational primes and the structure of units in number rings (Dirichlet’s unit theorem).

The next section is a gentle introduction to the theory of p-adic numbers. These numbers are useful, for example, for designing attacks against elliptic curve cryptosystems.

In the last section, we summarize some statistical tools. Under the assumption that the reader is already familiar with the elementary notion of probability, we discuss properties of random variables and of some common probability distributions (including uniform and normal distributions). The birthday paradox described in an exercise is often useful in cryptographic context (for example, for collision attacks on hash functions).

That is the end of this chapter. The compilation may initially look long and boring, perhaps intimidating too. The unfortunate reality is that public-key cryptology is mathematical, and it is arguably better to treat it in the formal way. If the reader is not comfortable with mathematics (in general), cryptology is perhaps not her cup of tea. An elementary approach to cryptology is what many other books have adopted. This book aims at being different in that respect. It is up to the reader to decide to what level of details she is willing to study cryptography.

Suggestions for Further Reading

Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.

—Samuel Johnson

In this chapter, we have summarized the basic mathematical facts that cryptologists are expected to know in order to have a decent understanding of the present-day public-key technology. Our discussion has been often more intuitive than mathematically complete. A reader willing to gain further insight in these areas should look at materials written specifically to deal with the specialized topics. Here are our (biased) suggestions.

There are numerous textbooks on introductory algebra. The books by Herstein [125], Fraleigh [96], Dummit and Foote [81], Hungerford [133] and Adkins and Weintraub [1] are some of our favourites. The algebra of commutative rings with identity (rings by our definition) is called commutative algebra and is the basic for learning advanced areas of mathematics like algebraic geometry and algebraic number theory. A serious study of these disciplines demands more in-depth knowledge of commutative algebra than we have presented in Section 2.13.1. Atiyah and MacDonald’s book [14] is a de facto standard on commutative algebra. Hoffman and Kunze’s book [127] is a good reference for linear algebra and matrix algebra.

Elementary number theory deals with the theory of (natural) numbers without using sophisticated techniques from complex analysis and algebra. Zuckerman et al. [316] can be consulted for a lucid introduction to this subject. The books by Burton [42] and Mollin [207] are good alternatives.

Thorough mathematical treatise on finite fields can be found in the books by Lidl and Niederreiter [179, 180] of which the second also deals with computational issues. Other books of computational flavour include those by Menezes [191] and by Shparlinski [274]. Also see the paper [273] by Shparlinski.

The use of elliptic curves in cryptography has been proposed by Koblitz [150] and Miller [205], and that of hyperelliptic curves by Koblitz [151]. A fair mathematical understanding of elliptic curves banks on the knowledge of commutative algebra (see above) and algebraic geometry. Hartshorne’s book [124] is a detailed introduction to algebraic geometry. Fulton’s book [99] on algebraic curves is another good reference. Rigorous mathematical treatment on elliptic curves can be found in Silverman’s books [275, 276]. The book by Koblitz [152] is elementary, but has a somewhat different focus than needed in cryptology. By far, the best short-cut is the recent textbook from Washington [298]. Some other books by Koblitz [150, 153, 154], Blake et al. [24], Menezes [192] and Hankerson et al. [123] are written for non-experts in algebraic geometry (and hence lack mathematical details), but are good from computational viewpoint. The expository reports [46, 47] by Charlap et al. provide nice elementary introduction to elliptic curves. For hyperelliptic curves, on the other hand, no such books are available. Koblitz’s book [154] includes a chapter on hyperelliptic curves. In addition, an appendix in the same book, written by Menezes et al. much in the style of Charlap et al. [46, 47], provides an introductory and elementary coverage.

In an oversimplified sense, algebraic number theory deals with the study of number fields. The books by Janusz [140], Lang [160], Mollin [208] and Ribenboim [251] go well beyond what we cover in Section 2.13. Also see [89]. For a more modern and sophisticated treatment, look at Neukirch’s book [216]. A book dedicated to p-adic numbers is due to Koblitz [149]. Course notes from one of the authors of this book can also be useful in this regard. The notes are freely downloadable from:

http://www.facweb.iitkgp.ernet.in/~adas/IITK/course/MTH617/SS02/

Analytic number theory deals with the application of complex analytic techniques to solve problems in number theory. Although we do not explicitly need this branch of mathematics (apart from a few theorems that we mention without proofs), it is rather important for the study of numbers. Consult the books by Apostol [12] and by Ireland and Rosen [136] for this. Also see [249]. For complex analysis, we recommend the book by Ahlfors [6]

Feller’s celebrated book [92] is a classical reference on probability theory. Grinstead and Snell’s book [121] is available in the Internet.

3. Algebraic and Number-theoretic Computations

3.1Introduction
3.2Complexity Issues
3.3Multiple-precision Integer Arithmetic
3.4Elementary Number-theoretic Computations
3.5Arithmetic in Finite Fields
3.6Arithmetic on Elliptic Curves
3.7Arithmetic on Hyperelliptic Curves
3.8Random Numbers
 Chapter Summary
 Sugestions for Further Reading

From the start there has been a curious affinity between mathematics, mind and computing . . . It is perhaps no accident that Pascal and Leibniz in the seventeenth century, Babbage and George Boole in the nineteenth, and Alan Turing and John von Neumann in the twentieth – seminal figures in the history of computing – were all, among their other accomplishments, mathematicians, possessing a natural affinity for symbol, representation, abstraction and logic.

—Doron Swade [295]

. . . the laws of physics and of logic . . . the number system . . . the principle of algebraic substitution. These are ghosts. We just believe in them so thoroughly they seem real.

—Robert M. Pirsig [233]

The world is continuous, but the mind is discrete.

—David Mumford

3.1. Introduction

Now that we have studied the properties of important mathematical objects that play vital roles in public-key cryptology, it is time to concentrate on the algorithmic and implementation issues for working with these objects. We need well-defined schemes (data structures) to represent these objects and well-defined procedures (algorithms) to manipulate them. While a theoretical analysis of the performance of our data structures and algorithms is of great concern, it still leaves us in the abstract domain. In the long run, one has to translate the abstract statements in the algorithms to machine codes that the computer understands, and this is where the implementation tidbits come into picture. It is our personal experience that a naive implementation of an algorithm may run hundred times slower than a carefully optimized implementation of the same algorithm. In certain specific applications (like those based on smart cards), where memory is a scarce resource, one should also pay attention to the storage requirements of the data structures and code segments. This chapter is an introduction to all these specialized topics.

Before we proceed further, certain comments are in order. In this book, we describe algorithms using a pseudocode that closely resembles the syntax of the programming language C. The biggest difference between C and our pseudocode is that we have given preference to mathematical notations in place of C syntax. For example, = means equality in our codes, whereas assignment is denoted by :=. Similarly, our while and for loops look more human-readable, for example, for i = 0, 1, . . . , m – 1 instead of C’s for (i=0; i<m; i++). In order to understand our pseudocode, a knowledge of C (or a similar programming language) is helpful, but not essential, on the part of the reader.

For certain implementations, we assume that the target machine carries out 32-bit 2’s-complement arithmetic. This is indeed true for most modern PCs and personal work stations. By the term word, we mean a 32-bit unit in the computer memory. We will also assume that the compiler provides facilities for storing and doing arithmetic with unsigned 64-bit integers. Though this is not an ANSI C feature, most popular compilers used today do support this built-in data type (Examples: unsigned __int64 for the Microsoft Visual C++ Compiler and unsigned long long for the GNU C Compiler). Though it is apparently desirable to be more generic and to avoid these specific assumptions on the part of the machine and the compiler, our exposition highlights the power of fine-tuning based on the knowledge of the underlying system.

3.2. Complexity Issues

Given an algorithm (or an implementation of the same), the time and space required for the execution of the algorithm on a machine depend very much on the machine’s architecture and on the compiler. But this does not mean that we cannot make some general theoretical estimates. The so-called asymptotic estimates that we are going to introduce now tend to approach the real situation as the input size tends to infinity. For finite input sizes (which is always the case in practice), these theoretical predictions turn out to provide valuable guidelines.

3.2.1. Order Notations

We start with the following important definitions.

Definition 3.1.

Let f and g be positive real-valued functions of natural numbers.

  1. f is said to be bounded above by g or of the order of g, denoted f = O(g), if there exists an and a positive real constant c such that f(n) ≤ cg(n) for all nn0. In this case, we also say that g is bounded below by f and denote this by g = Ω(f).

  2. If f = O(g) and g = O(f), we say that f and g are of the same order and denote this by f = Θ(g) (or by g = Θ(f)). Equivalently, f = Θ(g) if and only if f = O(g) and f = Ω(g); that is, if and only if there exist an integer and real positive constants c1, c2 such that c1g(n) ≤ f(n) ≤ c2g(n) for all nn0.

  3. f is said to be of strictly lower order than g, denoted f = o(g), if f(n)/g(n) tends to 0 as n tends to infinity. In other words, f = o(g) if and only if for every real positive constant c (however small it may be) there exists an integer such that f(n) < cg(n) for all nnc. If f = o(g), we also say that g is of strictly higher order than f and denote this by g = ω(f). Thus g = ω(f) if and only if for every real positive constant c (however large it may be) there exists an integer such that g(n) > cf(n) for all nnc.

Example 3.1.
  1. Let f(n) := adnd + · · · + a1n + a0 with d ≥ 0, , ad > 0. Then f = Θ(nd). This heuristically means that as n becomes sufficiently large, the leading term adnd dominates over the other terms, and apart from the constant of proportionality ad the function f(n) grows with n as nd does. If f = Θ(nd) for some integer d > 0, we say that f is of polynomial order in n.[1] A Θ(1) function is often called a constant function.

    [1] This is not the complete truth. Functions like , n2.3 or n3(log n)2 would be better included in the polynomial family. Thus, we may define f to be of polynomial order (in n), if f = O(nd) and f = Ω(nd′) for some positive real constants d, d′. Similar comments hold for poly-logarithmic and exponential orders.

  2. If f = Θ((log n)a) for some real a > 0, we say that f is of poly-logarithmic order in n. By Exercise 3.2(b), any function of poly-logarithmic order grows asymptotically slower than any function of polynomial order.

  3. If f = Θ(an) for some real a > 1, f said to be of exponential order in n. Again by Exercise 3.2(b) any function of exponential order grows asymptotically faster than any function of polynomial order.

  4. Now, consider a function of the form

    Equation 3.1


    for real c > 0 and for 0 ≤ α ≤ 1. For α = 0, we have f = Θ(nc); that is, f is of polynomial order. On the other extreme, if α = 1, f = Θ(an), where a := exp(c), that is, f is of exponential order. If 0 < α < 1, we say that f is of subexponential order in n, since the order of f is somewhere in between polynomial and exponential. We will come across functions of subexponential orders quite frequently in the rest of the book. Note that as α increases from 0 to 1, the order of f also increases monotonically from polynomial to exponential.

  5. A function f = O(na(log n)b) with a > 0 and b ≥ 0 is often denoted by the soft O-notation: f = O~(na). This implies that up to multiplication by a polynomial in log n the function f is of the order of na. Similarly, if f = O(ang(n)) for a > 1 and for some g(n) of polynomial order, we say that f = O~(an). Intuitively spoken, the O-notation hides constant multipliers, whereas the soft O-notation suppresses exponentially small multipliers.

  6. The notion of order can be readily extended to functions with two or more input variables. For example, for positive real-valued functions f, g of two positive integer variables m, n one says f = O(g), if for some m0, and for some positive real constant c one has f(m, n) ≤ cg(m, n) for all mm0 and nn0. The function f(m, n) = m32n is of polynomial order in m, but of exponential order in n.

The order notation is used to analyse algorithms in the following way. For an algorithm, the input size is defined as the total number of bits needed to represent the input of the algorithm. We find asymptotic estimates of the running time and the memory requirement of the algorithm in terms of its input size. Let f(n) denote the running time[2] of an algorithm A for an input of size . If f(n) = Θ(na) (or, more generally, if f = O(na)) for some a > 0, A is called a polynomial-time algorithm. If a = 1 (resp. 2, 3, . . .), then A is specifically called a linear-time (resp. quadratic-time, cubic-time, . . .) algorithm. A Θ(1) algorithm is often called a constant-time algorithm. If f = Θ(bn) for some b > 1, A is called an exponential-time algorithm. Similarly, if f satisfies Equation (3.1) with 0 < α < 1, A is called a subexponential-time algorithm.

[2] The practical running time of an algorithm may vary widely depending on its implementation and also on the processor, the compiler and even on run-time conditions. Since we are talking about the order of growth of running times in relation to the input size, we neglect the constants of proportionality and so these variations are usually not a problem. If one plans to be more concrete, one may measure the running time by the number of bit operations needed by the algorithm.

One has similar classifications of an algorithm in terms of its space requirements, namely, polynomial-space, linear-space, exponential-space, and so on. We can afford to be lazy and drop -time from the adjectives introduced in the previous paragraph. Thus, an exponential algorithm is an exponential-time algorithm, not an exponential-space algorithm.

It is expedient to note here that the running time of an algorithm may depend on the particular instance of the input, even when the input size is kept fixed. For an example, see Exercise 3.3. We should, therefore, be prepared to distinguish, for a given algorithm and for a given input size n, between the best (that is, shortest) running time fb(n), the worst (that is, longest) running time fw(n), the average running time fa(n) on all possible inputs (of size n) and the expected running time fe(n) for a randomly chosen input (of size n). In typical situations, fw(n), fa(n) and fe(n) are of the same order, in which case we simply denote, by running time, one of these functions. If this is not the case, an unqualified use of the phrase running time would denote the worst running time fw(n).

The order notation, though apparently attractive and useful, has certain drawbacks. First it depicts the behaviour of functions (like running times) as the input size tends to infinity. In practice, one always has finite input sizes. One can check that if f(n) = n100 and g(n) = (1.01)n are the running times of two algorithms A and B respectively (for solving the same problem), then f(n) ≤ g(n) if and only if n = 1 or n ≥ 117,309. But then if the input size is only 1,000, one would prefer the exponential-time algorithm B over the polynomial-time algorithm A. Thus asymptotic estimates need not guarantee correct suggestions at practical ranges of interest. On the other hand, an algorithm which is a product of human intellect does not tend to have such extreme values for the parameters; that is, in a polynomial-time algorithm, the degree is usually ≤ 10 and the base for an exponential-time algorithm is usually not as close to 1 as 1.01 is. If we have f(n) = n5 and g(n) = 2n as the respective running times of the algorithms A and B, then A outperforms B (in terms of speed) for all n ≥ 23.

The second drawback of the order notation is that it suppresses the constant of proportionality; that is, an algorithm whose running time is 100n2 has the same order as one whose running time is n2. This is, however, a situation that we cannot neglect in practice. In particular, when we compare two different implementations of the same algorithm, the one with a smaller constant of proportionality is more desirable than the one with a larger constant. This is where implementation tricks prove to be important and even indispensable for large-scale applications.

3.2.2. Randomized Algorithms

A deterministic algorithm is one that always follows the same sequence of computations (and thereby produces the same output) for a given input. The deterministic running time of a computational problem P is the fastest of the running times (in order notation) of the known algorithms to solve P.

If an algorithm makes some random choices during execution, we call the algorithm randomized or probabilistic. The exact sequence of computations followed by the algorithm depends on these random choices and as a result different executions of the same algorithm may produce different outputs for a given input. At first glance, randomized algorithms look useless, because getting different outputs for a given input is apparently not what one would really want. But there are situations where this is desirable. For example, in an implementation of the RSA protocol, one generates random primes p and q of given bit lengths. Here we require our prime generation procedure to produce different primes during different executions (that is, for different entities on the net).

More importantly, randomized algorithms often provide practical computational solutions for many problems for which no practical deterministic algorithms are known. We will shortly encounter many such situations where randomized algorithms are simplest and/or fastest known algorithms. However, this sudden enhancement in performance by random choices does not come for free. To explain the so-called darker sides of randomization, we explain two different types of randomized algorithms.

A Monte Carlo algorithm is a randomized algorithm that may produce incorrect outputs. However, for such an algorithm to be useful, we require that the running time be always small and the probability of an error sufficiently low. A good example of a Monte Carlo algorithm is Miller–Rabin’s algorithm (Algorithm 3.13) for testing the primality of an integer. For an integer of bit size n, the Miller–Rabin test with t iterations runs in time O(tn3). Whenever the algorithm outputs false, it is always correct. But an answer of true is incorrect with an error probability ≤ 2–2t, that is, it certifies a composite integer as a prime with probability ≤ 2–2t. For t = 20, an error is expected to occur less than once in every 1012 executions. With this little sacrifice we achieve a running time of O(n3) (for a fixed t), whereas the best deterministic primality testing algorithm (known to the authors at the time of writing this book) takes time O(n7.5) and hence is not practical.

A Las Vegas algorithm is a randomized algorithm which always produces the correct output. However the running time of such an algorithm depends on the random choices made. For such an algorithm to be useful, we expect that for most random choices the running time is small. As an example, consider the problem of finding a random (monic) irreducible polynomial of degree n over . Algorithm 3.22 tests the irreducibility of a polynomial in in deterministic polynomial time. We generate random polynomials of degree n and check the irreducibility of these polynomials by Algorithm 3.22. From Section 2.9.2, we know that a randomly chosen monic polynomial of degree n over a finite field is irreducible with an approximate probability of 1/n. This implies that after O(n) random polynomials are tried, one expects to find an irreducible polynomial. The resulting Las Vegas algorithm (Algorithm 3.23) runs in expected polynomial time. It may, however, happen that for certain random choices we keep on generating reducible polynomials for an exponential number of times, but the likelihood of such an accident is very, very low (Exercise 3.5).

An algorithm is said to be a probabilistic or randomized polynomial-time algorithm, if it is either a Monte Carlo algorithm with polynomial worst running time or a Las Vegas algorithm with polynomial expected running time. Both the above examples of randomized algorithms are probabilistic polynomial-time algorithms. A combination of these two types of algorithms can also be conceived; namely, algorithms that produce correct outputs with high probability and have polynomial expected running time. Some computational problems are so challenging that even such probably correct and probably fast algorithms are quite welcome.

We finally note that there are certain computational problems for which the deterministic running time is exponential and for which randomization also does not help much. In some cases, we have subexponential randomized algorithms which are still too slow to be of reasonable practical use. Some of these so-called intractable problems are at the heart of the security of many public-key cryptographic protocols.

3.2.3. Reduction Between Computational Problems

In the last two sections, we have introduced theoretical measures (the order notations) for estimating the (known) difficulty of solving computational problems. In this section, we introduce another concept by which we can compare the relative difficulty of two computational problems.

Let P1 and P2 be two computational problems. We say that P1 is polynomial-time reducible to P2 and denote this as , if there is a polynomial-time algorithm which, given a solution of P2, provides a solution for P1. This means that if , then the problem P1 is no more difficult than P2 apart from the extra polynomial-time reduction effort. In that case, if we know an algorithm to solve P2 in polynomial time, then we have a polynomial-time algorithm for P1 too. If and , we say that the problems P1 and P2 are polynomial-time equivalent and write P1P2.

In order to give an example of these concepts, we let G be a finite cyclic multiplicative group of order n and g a generator of G. The discrete logarithm problem (DLP) is the problem of computing for a given an integer x such that a = gx. The Diffie–Hellman problem (DHP), on the other hand, is the problem of computing gxy from the given values of gx and gy. If one can compute y from gy, one can also compute gxy = (gx)y by performing an exponentiation in the group G. Therefore, , if exponentiations in G can be computed in polynomial time. In other words, if a solution for DLP is known, a solution for DHP is also available: that is, DHP is no more difficult than DLP except for the additional exponentiation effort. However, the reverse implication (that is, whether ) is not known for many groups.

So far we have assumed that our reduction algorithms are deterministic. If we allow randomized (that is, probabilistic) polynomial-time reduction algorithms, we can similarly introduce the concepts of randomized polynomial-time reducibility and of randomized polynomial-time equivalence. We urge the reader to formulate the formal definitions for these concepts.

Exercise Set 3.2

3.1
  1. Sort the following functions in the increasing sequence of order. (Don’t mind if some of these functions are not defined for a few values of n.)

    1012, 2n, 22n, 2n2, 100n2, 10–3n3, 1/n, , n!, nn,

    log n, (log n)/n, n/log n, n2 log n, n(log n)2, (0.1)log n, (log n)n,

    1/log n, , 106(log n)100, log log n, 2log log n, nlog log n,

    , , ,

    exp(n1/3(ln n)2/3), exp((ln n)1/3(ln ln n)2/3).

  2. Evaluate the functions of Part (a) at n = 10i for i = 1, 2, . . . , 10 and conclude that as n gets larger, the asymptotic ordering tallies with the actual ordering more correctly.

3.2
  1. Show that for any real a > 1 and b > 0 one has nb = o(an).

  2. For any positive real c, d, show that (log n)c = o(nd).

  3. Show that if f = O(g) and g = O(h), then f = O(h).

  4. Give an example to show that f = O(g) does not necessarily imply f = Θ(g).

  5. Give an example of a function f with f = O(n1+ε) for every ε > 0, but f is not O(n).

3.3Suppose that an algorithm A takes as input a bit string and runs in time g(t), where t is the number of one-bits in the input string. Let fb(n), fw(n), fa(n) and fe(n) respectively denote the best, worst, average and expected running times of A for inputs of size n. Derive the following table under the assumption that each of the 2n bit strings of length n is equally likely.
  Running times 
g(t)fb(n)fw(n)fa(n)fe(n)
t0nn/2n/2
t20n2n(n + 1)/4n2/4
2t12n(3/2)n

3.4
  1. Show that an exponential-space (resp. subexponential-space) algorithm must be (at least) exponential-time (resp. subexponential-time) too. You may assume that at a time a computing device can access (read/write) at most a finite number of memory locations.

  2. Give an example of an algorithm that is exponential-time but polynomial-space.

3.5Consider the Las Vegas algorithm discussed in Section 3.2.2 for generating a random irreducible polynomial of degree n over . Assume that a randomly chosen polynomial in of degree n has (an exact) probability of 1/n for being irreducible. Find out the probability pr that r polynomials chosen randomly (with repetition) from are all reducible. For n = 1000, calculate the numerical values of pr for r = 10i, i = 1, . . . , 6, and find the smallest integers r for which pr ≤ 1/2 and pr ≤ 10–12. Find the expected number of polynomials tested for irreducibility, before the algorithm terminates.
3.6Let n = pq be the product of two distinct primes p and q. Show that factoring n is polynomial-time equivalent to computing φ(n) = (p–1)(q–1), where φ is Euler’s totient function. (Assume that an arithmetic operation (including computation of integer square roots) on integers of bit size t can be performed in polynomial time (in t).)
3.7Let G be a finite cyclic multiplicative group and let H be the subgroup of G generated by whose order is known. The generalized discrete logarithm problem (GDLP) is the following: Given , find out if and, if so, find an integer x for which a = hx. Show that GDLP ≅ DLP, if exponentiations in G can be carried out in polynomial time and if DLP in H is polynomial-time equivalent to DLP in G. [H]

3.3. Multiple-precision Integer Arithmetic

Cryptographic protocols based on the rings and demand n and p to be sufficiently large (of bit length ≥ 512) in order to achieve the desired level of security. However, standard compilers do not support data types to hold with full precision the integers of this size. For example, C compilers support integers of size ≤ 64 bits. So one must employ custom-designed data types for representing and working with such big integers. Many libraries are already available that can handle integers of arbitrary length. FREELIP, GMP, LiDIA, NTL and ZEN are some such libraries that are even freely available.

Alternatively, one may design one’s own functions for multiple-precision integers. Such a programming exercise is not very difficult, but making the functions run efficiently is a huge challenge. Several tricks and optimization techniques can turn a naive implementation to a much faster and more memory-efficient code and it needs years of experimental experience to find out the subtleties. Theoretical asymptotic estimates might serve as a guideline, but only experimentation can settle the relative merits and demerits of the available algorithms for input sizes of practical interest. For example, the theoretically fastest algorithm known for multiplying two multiple-precision integers is based on the so-called fast Fourier transform (FFT) techniques. But our experience shows that this algorithm starts to outperform other common but asymptotically slower algorithms only when the input size is at least several thousand bits. Since such very large integers are rarely needed by cryptographic protocols, FFT-based multiplication is not useful in this context.

3.3.1. Representation of Large Integers

In order to represent a large integer, we break it up into small parts and store each part in a memory word[3] accessible by built-in data types. The simplest way to break up a (positive) integer a is to predetermine a radix ℜ and compute the ℜ-ary representation (as–1, . . . , a0) of a (see Exercise 3.8). One should have ℜ ≤ 232 so that each ℜ-ary digit ai can be stored in a memory word. For the sake of efficiency, it is advisable to take ℜ to be a power of 2. It is also expedient to take ℜ as large as possible, because smaller values of ℜ lead to (possibly) longer size s and thereby add to the storage requirement and also to the running time of arithmetic functions. The best choice is ℜ = 232. We denote by ulong a built-in unsigned integer data type provided by the compiler (like the ANSI C standard unsigned long). We use an array of ulong for storing the digits. The array can be static or dynamic. Though dynamic arrays are more storage-efficient (because they can be allocated only as much memory as needed), they have memory allocation and deallocation overheads and are somewhat more complicated to programme than static arrays. Moreover, for cryptographic protocols one typically needs integers no longer than 4096 bits. Since the product of two integers of bit size t has bit size ≤ 2t, a static array of 8192/32 = 256 ulong suffices for storing cryptographic integers. It is also necessary to keep track of the actual size of an integer, since filling up with leading 0 digits is not an efficient strategy. Finally, it is often useful to have a signed representation of integers. A sign bit is also necessary for this case. We state three possible declarations in Exercise 3.11.

[3] We assume that a word in the memory is 32 bits long.

3.3.2. Basic Arithmetic Operations

We now describe the implementations of addition, subtraction, multiplication and Euclidean division of multiple-precision integers. Every other complex operation (like modular arithmetic, gcd) is based on these primitives. It is, therefore, of utmost importance to write efficient codes for these basic operations.

For integers of cryptographic sizes, the most efficient algorithms are the standard ones we use for doing arithmetic on decimal numbers, that is, for two positive integers a = as–1 . . . a0 and b = bt–1 . . . b0 we compute the sum c = a + b = cr–1 . . . c0 as follows. We first compute a0 + b0. If this sum is ≥ ℜ, then c0 = a0 + b0 – ℜ and the carry is 1, otherwise c0 = a0 + b0 and the carry is 0. We then compute a1 + b1 plus the carry available from the previous digit, and compute c1 and the next carry as before.

For computing the product d = ab = dl–1 . . . d0, we do the usual quadratic procedure; namely, we initialize all the digits of d to 0 and for each i = 0, . . . , s – 1 and j = 0, . . . , t – 1 we compute aibj and add it to the (i + j)-th digit of d. If this sum (call it σ) at the (i + j)-th location exceeds ℜ – 1, we find out q, r with σ = qℜ + r, r < ℜ. Then di+j is assigned r, and q is added to the (i + j + 1)-st location. If that addition results in a carry, we propagate the carry to higher locations until it gets fully absorbed in some word of d.

All these sound simple, but complications arise when we consider the fact that the sum of two 32-bit words (and a possible carry from the previous location) may be 33 bits long. For multiplication, the situation is even worse, because the product aibj can be 64 bits long. Since our machine word can hold only 32 bits, it becomes problematic to hold all these intermediate sums and products to full precision. We assume that the least significant 32 bits are correctly returned and assigned to the output variable (ulong), whereas the leading 32 bits are lost.[4] The most efficient way to keep track of these overflows is to use assembly instructions and this is what many number theory packages (like PARI and UBASIC) do. But this means that for every target architecture we have to write different assembly codes. Here we describe certain tricks that make it possible to grab the overflow information with only high-level languages, without sufficiently degrading the performance compared to assembly instructions.

[4] This is the typical behaviour of a CPU that supports 2’s complement arithmetic.

Addition and subtraction

First consider the sum ai + bi. We compute the least significant 32 bits by assigning ci = ai + bi. It is easy to see that an overflow occurs during this sum if and only if ci < ai. We set the output carry accordingly. Now, let us consider the situation when we have an input carry: that is, when we compute the sum ci = ai + bi+1. Here an overflow occurs if and only if ciai. Algorithm 3.1 performs this addition of words.

Algorithm 3.1. Addition of words

Input: Words ai and bi and the input carry .

Output: Word ci and the output carry with ai + bi + γi = ci + δiℜ.

Steps:

ci := ai + bi.

ifi) { ci ++, δi := ( (ciai) ? 1 : 0 ). } else { δi := ( (ci < ai) ? 1 : 0 ). }

Algorithm 3.1 assumes that ci and ai are stored in different memory words. If this is not the case, we should store ai + bi in a temporary variable and, after the second line, ci should be assigned the value of this temporary variable. Note also that many processors provide an increment primitive which is faster than the general addition primitive. In that case, the statement ci++ is preferable to ci := ci+1.

For subtraction, we proceed analogously from right to left and keep track of the borrow. Here the check for overflow can be done before the subtraction of words is carried out (and, therefore, no temporary variable is needed, if we assume that the output carry is not stored in the location of the operands).

Algorithm 3.2. Subtraction of words

Input: Words ai and bi and the input borrow .

Output: Word ci and the output borrow with aibi – γi = ci – δiℜ.

Steps:

ifi) { δi := ( (aibi) ? 1 : 0 ), ci := aibi, ci – –. }

else { δi := ( (ai < bi) ? 1 : 0 ), ci := aibi. }

We urge the reader to develop the complete addition and subtraction procedures for multiple-precision integers, based on the above primitives for words.

Multiplication

The product of two 32-bit words can be as long as 64 bits, and we plan to (compute and) store this product in two words. Assuming the availability of a built-in 64-bit unsigned integer data type (which we will henceforth denote as ullong), this can be performed as in Algorithm 3.3.

Algorithm 3.3. Multiplication of words

Input: Words a and b.

Output: Words c and d with ab = cℜ + d.

Steps:

/* We use a temporary variable t of data type ullong */

t := (ullong)(a) * (ullong)(b), c := (ulong)(t ≫ 32), d := (ulong)t.

We use a temporary 64-bit integer variable t to store the product ab. The lower 32 bits of t is stored in d by simply typecasting, whereas the higher 32 bits of t is obtained by right-shifting t (the operator ≫) by 32 bits. This is a reasonable strategy given that we do not explore assembly-level instructions. Algorithm 3.4 describes a multiplication algorithm for two multiple-precision integer operands, that does not directly use the word-multiplying primitive of Algorithm 3.3.

The reader can verify easily that this code properly computes the product. We now highlight how this makes the computation efficient. The intermediate results are stored in the array t of 64-bit ullong. This means that after the 64-bit product aibj of words ai and bj is computed (in the temporary variable T), we directly add T to the location ti+j. If the sum exceeds ℜ2 – 1 = 264 – 1, that is, if an overflow occurs, we should add ℜ to ti + j + 1 or equivalently 1 to ti+j+2. This last addition is one of ullong integers and can be made more efficient, if this is replaced by ulong increments, and this is what we do using the temporary array u. Since the quadratic loop is the bottleneck of the multiplication procedure, it is absolutely necessary to make this loop as efficient as possible.

Algorithm 3.4. Multiplication of multiple-precision integers

Input: Integers a = (ar–1 . . . a0) and b = (bs–1 . . . b0)

Output: The product c = (cr+s–1 . . . c0) = ab.

Steps:

/* Let T be a variable and t0, . . . , tr+s–1 an array of ullong variables */

/* Let v be a variable and u0, . . . , ur+s–1 an array of ulong variables */

Initialize the array locations ci, ti and ui to 0 for all i = 0, . . . , r + s – 1.

/* The quadratic loop */
for (i = 0, . . . , r – 1) and (j = 0, . . . , s – 1) {
   T := (ullong)(ai) * (ullong)(bj).
   if ((ti+j + = T) < Tui+j+2 ++.
}

/* Deferred normalization */
for (i = 0, . . . , r + s – 1) {
    if ((ci + = ui) < uiui+1 ++.
    v := (ulong)(ti), if ((ci + = v) < vui+1++.
    v := (ulong)(ti ≫ 32), if ((ci+1 + = v) < vui+2 ++.
}

After the quadratic loop, we do deferred normalization from the array of 64-bit double-words ti to the array of 32-bit words ci. This is done using the typecasting and right-shift strategy mentioned in Algorithm 3.3. We should also take care of the intermediate carries stored in the array u. The normalization loop takes a total time of O(r + s), whereas the quadratic loop takes time O(rs). If we had done normalization inside the quadratic loop itself, that would incur an additional O(rs) cost (which is significantly more than that of deferred normalization).

Squaring

If both the operands a and b of multiplication are same, it is not necessary to compute aibj and ajbi separately. We should add to ti+j the product , if i = j, or the product 2aiaj, if i < j. Note that 2aiaj can be computed by left shifting aiaj by one bit. This might result in an overflow which can be checked before shifting by looking at the 64th bit of aiaj. Algorithm 3.5 incorporates these changes.

Fast multiplication

For the multiplication of two multiple-precision integers, there are algorithms that are asymptotically faster than the quadratic Algorithms 3.4 and 3.5. However, not all these theoretically faster algorithms are practical for sizes of integers used in cryptology. Our practical experience shows that a strategy due to Karatsuba outperforms the quadratic algorithm, if both the operands are of roughly equal sizes and if the bit lengths of the operands are 300 or more. We describe Karatsuba’s algorithm in connection with squaring, where the two operands are same (and hence of the same size). Suppose we want to compute a2 for a multiple-precision integer a = (ar–1 . . . a0). We first break a into two integers of almost equal sizes, namely, α := (ar–1 . . . at) and β := (at–1 . . . a0), so that a = ℜtα + β. Now, a2 = α22t + 2αβℜt + β2 and 2αβ = (α2 + β2) – (α – β)2. We recursively invoke Karatsuba’s multiplication with operands α, β and α – β. Recursion continues as long as the operands are not too small and the depth of recursion is within a prescribed limit. One can check that Karatsuba’s algorithm runs in time O(rlg 3 lg r) = O(r1.585 lg r) which is a definite improvement over the O(r2) running time taken by the quadratic algorithm.

Algorithm 3.5. The quadratic loop for squaring

for (i = 0, . . . , r – 1) and (j = i, . . . , r – 1) {
   T := (ullong)(ai) * (ullong)(aj).
   if (i ≠ j) {
      if (the 64th bit of T is 1) ui+j+2 ++.
      T ≪= 1.
   }
   if ((ti+j + = T) < Tui+j+2 ++.
}

The best-known algorithm for multiplication of two multiple-precision integers is based on the fast Fourier transform (FFT) techniques and has running time O~(r). However, for integers used in cryptology this algorithm is usually not practical. Therefore, we will not discuss FFT multiplication in this book.

Division

Euclidean division with remainder of multiple-precision integers is somewhat cumbersome, although conceptually as difficult (that is, as simple) as the division procedure of decimal integers, taught in early days of school. The most challenging part in the procedure is guessing the next digit in the quotient. For decimal integers, we usually do this by looking at the first few (decimal) digits of the divisor and the dividend. This need not give us the correct digit, but something close to the same. In the case of ℜ-ary digits, we also make a guess of the quotient digit based on a few leading ℜ-ary digits of the divisor and the dividend, but certain precautions have to be taken to ensure that the guess is not too different from the correct one.

Suppose we are given positive integers a = (ar–1 . . . a0) and b = (bs–1 . . . b0)ℜ with ar–1 ≠ 0 and bs–1 ≠ 0, and we want to compute the integers x = (xrs . . . x0) and y = (ys–1 . . . y0) with a = xb + y, 0 ≤ y < b. First, we want that bs–1 ≥ ℜ/2 (you’ll see why, later). If this condition is already not met, we force it by multiplying both a and b by 2t for some suitable t, 0 < t < 32. In that case, the quotient remains the same, but the remainder gets multiplied by 2t. The desired remainder can be later found out easily by right-shifting the computed remainder by t bits. The process of making bs–1 ≥ ℜ/2 is often called normalization (of b). Henceforth, we will assume that b is normalized. Note that normalization may increase the word-size of a by 1.

Algorithm 3.6. Euclidean division of multiple-precision integers

Input: Integers a = (ar–1 . . . a0) and b = (bs–1 . . . b0) with r ≥ 3, s ≥ 2, ar–1 ≠ 0, bs–1 ≥ ℜ/2 and ab.

Output: The quotient x = (xrs . . . x0) = a quot b and the remainder y = (ys–1 . . . y0) = a rem b of Euclidean division of a by b.

Steps:

Initialize the quotient digits xi to 0 for i = 0, . . . , r – s.

/* The main loop */
for (i = r – 1, . . . , s) {
   /* Initial check */
   if (ai ≥ bs–1and (a ≥ bis+1) { xis+1++, a := a – bis+1. }

   /* Guess the next digit of quotient */
   if (ai = bs–1xis := ℜ – 1, else xis := ⌊(aiℜ + ai–1)/bs–1)⌋.
   if (xis ≠ 0)
       while (xis(bs–1ℜ + bs–2) > ai2 + ai–1ℜ + ai–2xis– –.

   /* Modify the guess to the correct value */
   z := xisbis.
   if (a < z) { xis– –, z := z – bis. }
   a := a – z.
}

/* Here the quotient may be one less than the actual value */
if (a ≥ b) { a := a – bx := x+1. }
y := a.

Algorithm 3.6 implements multiple-precision division. It is not difficult to prove the correctness of the algorithm. We refrain from doing so, but make some useful comments. The initial check inside the main loop may cause the increment of xis+1. This may lead to a carry which has to be adjusted to higher digits. This carry propagation is not mentioned in the code for simplicity. Since b is assumed to be normalized, this initial check needs to be carried out only once; that is, for a non-normalized b we have to replace the if statement by a while loop. This is the first advantage of normalization. In the first step of guessing the quotient digit xis, we compute ⌊(aiℜ + ai–1)/bs–1⌋ using ullong arithmetic. At this point, the guess is based only on two leading digits of a and one leading digit of b. In the while loop, we refine this guess by considering one more digit of a and b each. Since b is normalized, this while loop is executed no more than twice (the second advantage of normalization). The guess for xis made in this way is either equal to or one more than the correct value which is then computed by comparing a with xisbis. The running time of the algorithm is O(s(rs)). For a fixed r, this is maximum (namely O(r2)) when sr/2.

Bit-wise operations

Multiplication and division by a power of 2 can be carried out more efficiently using bit operations (on words) instead of calling the general procedures just described. It is also often necessary to compute the bit length of a non-zero multiple-precision integer and the multiplicity of 2 in it. In these cases also, one should use bit operations for efficiency. For these implementations, it is advantageous to maintain precomputed tables of the constants 2i, i = 0, . . . , 31, and of 2i – 1, i = 0, . . . , 32, rather than computing them in situ every time they are needed. In Algorithm 3.7, we describe an implementation of multiplication by a power of 2 (that is, the left shift operation). We use the symbols OR, ≫ and ≪ to denote bit-wise or, right shift and left shift operations on 32-bit integers.

Algorithm 3.7. Left-shift of multiple-precision integers

Input: Integer a = (ar–1 . . . a0) ≠ 0, ar–1 ≠ 0, and .

Output: The integer c = (cs–1 . . . c0) = a · 2t, cs–1 ≠ 0.

Steps:

u := t quot 32, v := t rem 32.
if (v = 0) { /* Word-by-word copy */
    s := r + u.
    for (i = r – 1, . . . , 0) ci+u := ai.
else { /* Use shifts of individual words */
    s := r + u + 1, cs–1 := 0.
    for (i = r – 1, . . . , 0) { ci+u+1 := ci+u+1 OR (ai ≫ (32 – v)), ci+u := (ai ≪ v). }
    if (cs–1 = 0) s– –.
}
for (i = u – 1, . . . , 0) ci := 0.

Unless otherwise mentioned, we will henceforth forget about the above structural representation of multiple-precision integers and denote arithmetic operations on them by the standard symbols (+, –, * or · or ×, quot, rem and so on).

3.3.3. GCD

Computing the greatest common divisor of two (multiple-precision) integers has important applications. In this section, we assume that we want to compute the (positive) gcd of two positive integers a and b. The Euclidean gcd loop comprising repeated division (Proposition 2.15) is not usually the most efficient way to compute integer gcds. We describe the binary gcd algorithm that turns out to be faster for practical bit sizes of the operands a and b. If a = 2ra′ and b = 2sb′ with a′ and b′ odd, then gcd(a, b) = 2min(r,s) gcd(a′, b′). Therefore, we may assume that a and b are odd. In that case, if a > b, then gcd(a, b) = gcd(ab, b) = gcd((ab)/2t, b), where t := v2(ab) is the multiplicity of 2 in ab. Since the sum of the bit sizes of (ab)/2t and b is strictly smaller than that of a and b, repeating the above computation eventually terminates the algorithm after finitely many iterations.

Algorithm 3.8. Extended binary gcd

Input: Two positive integers a, b with ab and b odd.

Output: Integers d, u and v with d = gcd(a, b) = ua + vb > 0. If (a, b) ≠ (1, 1), then |u| < b and |v| < a.

Steps:

/* Initial reduction */
Compute integers q and r satisfying a = bq + r with 0 ≤ r < b.
if (r = 0) { (duv) := (b, 0, 1), return. }

/* Initialize */
(xy) := (br).
v1 := 0, v2 := 1.

/* Main loop */
while (1) {
   if (x ≥ y) {
      x := x – y.   /* x is even here except perhaps in the first iteration */
      v1 := v1 – v2.
      if (x = 0) {   /* End loop and return du and v */
         u2 := (y – v2r)/b.
         (duv) := (yv2u2 – v2q).
         Return.
      } else if (x is even) {
         t := v2(x), x := x/2t.    /* x is odd here */
         for (i = 1, . . . , t) {
            if (v1 is odd) v1 := v1 + b.
            v1 := v1/2.
         }
       }
     } else { /* if (x < y) */
       y := y – xv2 := v2 – v1.    /* y is even here */
       t := v2(y), y := y/2t.   /* y is odd here */
       for (i = 1, . . . , t) {
          if (v2 is oddv2 := v2 + b.
          v2 := v2/2.
       }
   }
}

Multiple-precision division is much costlier than subtraction followed by division by a power of 2. This is why the binary gcd algorithm outperforms the Euclidean gcd algorithm. However, if the bit sizes of a and b differ reasonably, it is preferable to use Euclidean division once and replace the pair (a, b) by (b, a rem b), before entering the binary gcd loop. Even when the original bit sizes of a and b are not much different, one may carry out this initial reduction, because in this case Euclidean division does not take much time.

Recall from Proposition 2.16 that if d := gcd(a, b), then for some integers u and v we have d = ua + vb. Computation of d along with a pair of integers u, v is called the extended gcd computation. Both the Euclidean and the binary gcd loops can be augmented to compute these integers u and v. Since binary gcd is faster than Euclidean gcd, we describe an implementation of the extended binary gcd algorithm. We assume that 0 < ba and compute u and v in such a way that if (a, b) ≠ (1, 1), then |u| < b and |v| < a. Algorithm 3.8, which shows the details, requires b to be odd. The other operand a may also be odd, though the working of the algorithm does not require this.

In order to prove the correctness of Algorithm 3.8, we introduce the sequence of integers xk, yk, u1,k, u2,k, v1,k and v2,k for k = 0, 1, 2, . . . , initialized as:

x0 := b,u1, 0 := 1,v1, 0 := 0,
y0 := r,u2, 0 := 0,v2, 0 := 1.

During the k-th iteration of the main loop, k = 1, 2, . . . , we modify the values xk–1, yk–1, u1,k–1, u2,k–1, v1,k–1 and v2,k–1 to xk, yk, u1,k, u2,k, v1,k and v2,k in such a way that we always maintain the relations:

u1,kx0 + v1,ky0=xk,
u2,kx0 + v2,ky0=yk.

The main loop terminates when xk = 0, and at that point we have the desired relation yk = gcd(b, r) = u2,kb + v2,kr. For the updating during the k-th iteration, we assume that xk–1yk–1. (The converse inequality can be handled analogously.) The x and y values are updated as xk := (xk–1yk–1)/2tk, yk := yk–1, where tk := v2(xk–1yk–1). Thus, we have u2,k = u2,k–1 and v2,k = v2,k–1, whereas if tk > 0, we write

All the expressions within square brackets in the last equation are integers, since x0 = b is odd. Note that updating the variables in the loop requires only the values of these variables available from the previous iteration. Therefore, we may drop the prefix k and call these variables x, y, u1, u2, v1 and v2. Moreover, the variables u1 and u2 need not be maintained and updated in every iteration, since the updating procedure for the other variables does not depend on the values of u1 and u2. We need the value of u2 only at the end of the main loop, and this is available from the relation y = u2b + v2r maintained throughout the loop. The formula u2b + v2r = y = gcd(b, r) is then combined with the relations a = qb + r and gcd(a, b) = gcd(b, r) to get the final relation gcd(a, b) = v2a + (u2v2q)b.

Algorithm 3.8 continues to work even when a < b, but in that case the initial reduction simply interchanges a and b and we forfeit the possibility of the reduction in size of the arguments (x and y) caused by the initial Euclidean division.

Finally, we remove the restriction that b is odd. We write a = 2ra′ and b = 2sb′ with a′, b′ odd and call Algorithm 3.8 with a′ and b′ as parameters (swapping a′ and b′, if a′ < b′) to compute integers d′, u′, v′ with d′ = gcd(a′, b′) = ua′ + vb′. Without loss of generality, assume that rs. Then d := gcd(a, b) = 2sd′ = u′(2sa′) + vb. If r = s, then 2sa′ = a and we are done. So assume that r > s. If u′ is even, we can extract a power of 2 from u′ and multiply 2sa′ by this power. So let’s say that we have a situation of the form for some integers and , with odd, and for st < r. We can rewrite this as . Since is even, this gives us , where τ > t and where is odd or τ = r. Proceeding in this way, we eventually reach a relation of the form d = u(2ra′) + vb = ua + vb. It is easy to check that if (a′, b′) ≠ (1, 1), then the integers u and v obtained as above satisfy |u| < b and |v| < a.

3.3.4. Modular Arithmetic

So far, we have described how we can represent and work with the elements of . In cryptology, we are seemingly more interested in the arithmetic of the rings for multiple-precision integers n. We canonically represent the elements of by integers between 0 and n – 1.

Let a, . In order to compute a + b in , we compute the integer sum a + b, and, if a + bn, we subtract n from a + b. This gives us the desired canonical representative in . Similarly, for computing ab in , we subtract b from a as integers, and, if the difference is negative, we add n to it. For computing , we multiply a and b as integers and then take the remainder of Euclidean division of this product by n.

Note that is invertible (that is, ) if and only if gcd(a, n) = 1. For , a ≠ 0, we call the extended (binary) gcd algorithm with a and n as the arguments and get integers d, u, v satisfying d = gcd(a, n) = ua+vn. If d > 1, a is not invertible modulo n. Otherwise, we have ua ≡ 1 (mod n), that is, a–1u (mod n). The extended gcd algorithm indeed returns a value of u satisfying |u| < n. Thus if u > 0, it is the canonical representative of a–1, whereas if u < 0, then u + n is the canonical representative of a–1.

Modular exponentiation

Another frequently needed operation in is modular exponentiation, that is, the computation of ae for some and . Since a0 = 1 for all and since ae = (a–1)e for e < 0 and , we may assume, without loss of generality, that . Computing the integral power ae followed by taking the remainder of Euclidean division by n is not an efficient way to compute ae in . Instead, after every multiplication, we reduce the product modulo n. This keeps the size of the intermediate products small. Furthermore, it is also a bad idea to compute ae as (· · ·((a·aa)· · ·a) which involves e – 1 multiplications. It is possible to compute ae using O(lg e) multiplications and O(lg e) squarings in , as Algorithm 3.9 suggests. This algorithm requires the bits of the binary expansion of the exponent e, which are easily obtained by bit operations on the words of e.

The for loop iteratively computes bi := a(er–1 ... ei)2 (mod n) starting from the initial value br := 1. Since (er–1 . . . ei)2 = 2(er–1 . . . ei+1)2 + ei, we have (mod n). This establishes the correctness of the algorithm. The squaring (b2) and multiplication (ba) inside the for loop of the algorithm are computed in (that is, as integer multiplication followed by reduction modulo n). If we assume that er–1 = 1, then r = ⌈lg e⌉. The algorithm carries out r squares and ρ ≤ r multiplications in , where ρ is the number of bits of e, that are 1. On an average ρ = r/2. Algorithm 3.9 runs in time O((log e)(log n)2). Typically, e = O(n), so this running time is O((log n)3).

Algorithm 3.9. Modular exponentiation: square-and-multiply algorithm

Input: , .

Output: .

Steps:

Let the binary expansion of e be e = (er–1 . . . e1e0)2where each .
b := 1.
for (i = r – 1, . . . , 0) {
   b := b2 (mod n).    /* Squaring */
   if (ei = 1) b := ba (mod n).    /* Multiplication */
}

Now, we describe a simple variant of this square-and-multiply algorithm, in which we choose a small t and use the 2t-ary representation of the exponent e. The case t = 1 corresponds to Algorithm 3.9. In practical situations, t = 4 is a good choice. As in Algorithm 3.9, multiplication and squaring are done in .

Algorithm 3.10. Modular exponentiation: windowed square-and-multiply algorithm

Input: , .

Output: .

Steps:

Let e = (er–1 . . . e1e0)2twhere each .
Compute and store  for l = 0, 1, . . . , 2t – 1.   /* Precomputation */
b := 1.
for (i = r – 1, . . . , 0) {
   for (j = 1, . . . , t) b := b2 (mod n).    /* Squaring */
   b := baei (mod n).     /* Multiplication: Read aei from the precomputed table */
}

In Algorithm 3.10, the powers al, l = 0, 1, . . . , 2t – 1, are precomputed using the formulas: a0 = 1, a1 = a and al = al–1 · a for l ≥ 2. The number of squares inside the for loop remains (almost) the same as in Algorithm 3.9. However, the number of multiplications in this loop reduces at the expense of the precomputation step. For example, let n be an integer of bit length 1024 and let en. A randomly chosen e of this size has about 512 one-bits. Therefore, the for loop of Algorithm 3.9 does about 512 multiplications, whereas with t = 4 Algorithm 3.10 does only 1024/4 = 256 multiplications with the precomputation step requiring 14 multiplications. Thus, the total number of multiplications reduces from (about) 512 to 14 + 256 = 270.

Montgomery exponentiation

During a modular exponentiation in , every reduction (computation of remainder) is done by the fixed modulus n. Montgomery exponentiation exploits this fact and speeds up each modular reduction at the cost of some preprocessing overhead.

Assume that the storage of n requires s ℜ-ary digits, that is, n = (ns–1 . . . n0) (with ns–1 ≠ 0). Take R := ℜs = 232s, so that R > n. As is typical in most cryptographic situations, n is an odd integer (for example, a big prime or a product of two big primes). Then gcd(ℜ, n) = gcd(R, n) = 1. Use the extended gcd algorithm to precompute n′ := –n–1 (mod ℜ).

We associate with , where (mod n). Since R is invertible modulo n, this association gives a bijection of onto itself. This bijection respects the addition in : that is, in . Multiplication in , on the other hand, corresponds to , and can be implemented as Algorithm 3.11 suggests.

Algorithm 3.11. Montgomery multiplication

Input: and (Montgomery representations of x, ).

Output: Montgomery representation of .

Steps:

Montgomery multiplication works as follows. In the first step, it computes the integer product . The subsequent for loop computes (mod n). Since n′ ≡ –n–1 (mod ℜ), the i-th iteration of the loop makes wi = 0 (and leaves wi–1, . . . ,w0 unchanged). So when the for loop terminates, we have w0 = w1 = · · · = ws–1 = 0: that is, is a multiple of ℜs = R. Therefore, is an integer. Furthermore, this is obtained by adding to a multiple of n: that is, for some integer k ≥ 0. Since R is coprime to n, it follows that (mod n). But this may be bigger than the canonical representative of . Since k is an integer with s ℜ-ary digits (so that k < R) and and n < R, it follows that . Therefore, if exceeds n – 1, a single subtraction suffices.

Computation of requires ≤ s2 single-precision multiplications. One can use the optimized Algorithm 3.4 for that purpose. In case of squaring, and further optimizations (say, in the form of Karatsuba’s method) can be employed.

Each iteration of the for loop carries out s + 1 single-precision multiplications. (The reduction modulo ℜ is just returning the more significant word in the two-word product win′.) Since, the for loop is executed s times, Algorithm 3.11 performs a total of ≤ s2 + s(s+1) = 2s2 + s single-precision multiplications.

Integer multiplication (Algorithm 3.4) followed by classical modular reduction (Algorithm 3.6) does almost an equal number of single-precision multiplications, but also O(s) divisions of double-precision integers by single-precision ones. It turns out that the complicated for loop of Algorithm 3.6 is slower than the much simpler loop in Algorithm 3.11. But if precomputations in the Montgomery multiplication are taken into account, we do not tend to achieve a speed-up with this new technique. For modular exponentiations, however, precomputations need to be done only once: that is, outside the square-and-multiply loop, and Montgomery multiplication pays off. In Algorithm 3.12, we rewrite Algorithm 3.9 in terms of the Montgomery arithmetic. A similar rewriting applies to Algorithm 3.10.

Algorithm 3.12. Montgomery exponentiation

Input: , .

Output: b = ae (mod n).

Steps:

/* Precomputations */
n′ := –n (mod ℜ). .

/* The square-and-multiply loop */

Exercise Set 3.3

3.8Let , ℜ > 1. Show that every can be represented uniquely as a tuple (as–1, . . . , a1, a0) for some (depending on a) with

a = as–1s–1 + · · · + a1ℜ + a0,

0 ≤ ai < ℜ for all i and as–1 ≠ 0. In this case, we write a as (as–1 . . . a0) or simply as as–1 . . . a0, when ℜ is understood from the context. ℜ is called the radix or base of this representation, as–1, . . . , a0 the (ℜ-ary) digits of a, as–1 the most significant digit, a0 the least significant digit and s the size of a with respect to the radix ℜ.

3.9Let . Show that every can be written uniquely as

a = asRs + as–1Rs–1 + · · · + a1R + a0

with each .

3.10

Negative radix Show that every integer can be written as

a = as(–2)s + as–1(–2)s–1 + · · · + a1(–2) + a0

with . Moreover, if we force that as ≠ 0 for a ≠ 0 and that s = 0 for a = 0, argue that this representation is unique.

3.11Investigate the relative merits and demerits of the following three representations (in C) of multiple-precision integers needed for cryptography. In each case, we have room for storing 256 ℜ-ary words, the actual size and a sign indicator. In the second and third representations, we use two extra locations (sizeIdx and signIdx) in the digit array for holding the size and sign information.
/* Representation 1 */
typedef struct {
   int size;
   boolean sign;
   ulong digits[256];
cryptInt1;
/* Representation 2 */
typedef ulong cryptInt2[258];
#define signIdx 0
#define sizeIdx 1
/* Representation 3 */
typedef ulong cryptInt3[258];
#define signIdx 256
#define sizeIdx 257

Remark: We recommend the third representation.

3.12Write an algorithm that prints a multiple-precision integer in decimal and an algorithm that accepts a string of decimal digits (optionally preceded by a + or – sign) and stores the corresponding integer as a multiple-precision integer. Also write algorithms for input and output of multiple-precision integers in hexadecimal, octal and binary.
3.13Write an algorithm which, given two multiple-precision integers a and b, compares the absolute values |a| and |b|. Also write an algorithm to compare a and b as signed integers.
3.14
  1. Write an algorithm that uses the Euclidean gcd loop (Proposition 2.15) to compute the gcd d of two integers a and b. (Observe that gcd(a, b) = gcd(b, a rem b) for b ≠ 0.)

  2. Modify the Euclidean gcd algorithm of Part (a), so that for given integers a, b we obtain d, u, v with d = gcd(a, b) = ua + vb.

3.15Describe a representation of rational numbers with exact multiple-precision numerators and denominators. Implement the arithmetic (addition, subtraction, multiplication and division) of rational numbers under this representation.
3.16

Sliding window exponentiation Suppose we want to compute the modular exponentiation ae (mod n). Consider the following variant of the square-and-multiply algorithm: Choose a small t (say, t = 4) and precompute a2t–1, a2t–1+1, . . . , a2t–1 modulo n. Do squaring for every bit of e, but skip the multiplication for zero bits in e. Whenever a 1 bit is found, consider the next t bits of e (including the 1 bit). Let these t bits represent the integer l, 2t–1l ≤ 2t – 1. Multiply by al (mod n) (after computing usual t squares) and move right in e by t bit positions. Argue that this method works and write an algorithm based on this strategy. What are the advantages and disadvantages of this method over Algorithm 3.10?

3.17Suppose we want to compute aebf (mod n), where both e and f are positive r-bit integers. One possibility is to compute ae and bf modulo n individually, followed by a modular multiplication. This strategy requires the running time of two exponentiations (neglecting the time for the final multiplication). In this exercise, we investigate a trick to reduce this running time to something close to 1.25 times the time for one exponentiation. Precompute ab (mod n). Inside the square-and-multiply loop, either skip the multiplication or multiply by a, b or ab, depending upon the next bits in the two exponents e and f. Complete the details of this algorithm. Deduce that, on an average, the running time of this algorithm is as declared above.
3.18Let , m ≠ 1. An addition chain for m of length l is a sequence 1 = a1, a2, . . . , al = m of natural numbers such that for every index i, 2 ≤ il, there exist indices i1, i2 < i with ai = ai1 + ai2. (It is allowed to have i1 = i2.)
  1. If 1 = a1, a2, . . . , al = m is an addition chain for m and if j1, j2, . . . , jl is a permutation of 1, 2, . . . , l with aj1aj2 ≤ · · · ≤ ajl, show that aj1, aj2, . . . , ajl is also an addition chain for m. It, therefore, suffices to consider sorted addition chains only.

  2. Show that m has an addition chain of length ≤ 2 ⌈lg m⌉. [H]

  3. Let G be a (multiplicative) group and . Design an algorithm for computing gm given an addition chain for m. What is the complexity of the algorithm (in terms of the length of the given addition chain)?

  4. Show that Algorithms 3.9 and 3.10 use addition chains for e of lengths ≤ 2 ⌈lg e⌉.

3.4. Elementary Number-theoretic Computations

Now that we know how to work in and in the residue class rings , , we address some important computational problems associated with these rings. In this chapter, we restrict ourselves only to those problems that are needed for setting up various cryptographic protocols.

3.4.1. Primality Testing

One of the simplest and oldest questions in algorithmic number theory is to decide if a given integer , n > 1, is prime or composite. Practical primality testing algorithms are based on randomization techniques. In this section, we describe the Monte Carlo algorithm due to Miller and Rabin. The obvious question that comes next is to find one (or all) of the prime factors of an integer, deterministically or probabilistically proven to be composite. This is the celebrated integer factorization problem and will be formally introduced in Section 4.2. In spite of the apparent proximity between the primality testing and the integer factoring problems, they currently have widely different (known) complexities. Primality testing is easy and thereby promotes efficient setting up of cryptographic protocols. On the other hand, the difficulty of factoring integers protects these protocols against cryptanalytic attacks.

Definition 3.2.

Let n be an odd integer greater than 1 and let with gcd(a, n) = 1. Then n is called a pseudoprime to the base a, if an–1 ≡ 1 (mod n).

By Fermat’s little theorem, a prime p is a pseudoprime to every base with gcd(a, p) = 1. However, the converse of this is not true. By Exercise 3.19, n is not a pseudoprime to at least half of the bases in , provided that there is at least one such base in . Unfortunately, there exist composite integers m, known as Carmichael numbers, such that m is a pseudoprime to every base . The smallest Carmichael number is 561 = 3 × 11 × 17. Exercises 3.21 and 3.22 investigate some properties of these numbers. Though Carmichael numbers are not very abundant in nature (), they are still infinite in number. So a robust primality test requires n to satisfy certain constraints in addition to being a pseudoprime to one or more bases. The following constraint is due to Solovay and Strassen.

Definition 3.3.

Let n be an odd integer > 1 and let with gcd(a, n) = 1. Then n is called an Euler pseudoprime or a Solovay–Strassen pseudoprime to the base a, if (mod n), where is the Jacobi symbol (Definition 2.32). Clearly, an Euler pseudoprime to the base a is also a pseudoprime to the base a.

By Euler’s criterion (Proposition 2.21), if p is a prime and gcd(a, p) = 1, then p is an Euler pseudoprime to the base a. The converse in not true, in general, but if n is composite, then n is an Euler pseudoprime to at most φ(n)/2 bases in (Exercise 3.20). This, in turn, implies that if n is an Euler pseudoprime to t randomly chosen bases in , then the chance that n is composite is no more than 1/2t. This observation leads to a Monte Carlo algorithm for testing the primality of an integer, where the probability of error (1/2t) can be made arbitrarily small by choosing large values of t. A more efficient algorithm can be developed using the following concept due to Miller and Rabin.

Definition 3.4.

Let n be an odd integer > 1 with n – 1 = 2rn′, r := v2(n – 1) > 0, n′ odd, and let with gcd(a, n) = 1. Then n is called a strong pseudoprime to the base a, if either an ≡ 1 (mod n) or a2in ≡ –1 (mod n) for some i, 0 ≤ i < r. It is clear that if n is a strong pseudoprime to the base a, then n is also a pseudoprime to the base a. What is less evident but still true is that if n is a strong pseudoprime to the base a, then n is also an Euler pseudoprime to the base a.

The rationale behind this definition is the following. If for some we have an–1 ≢ 1 (mod n), we conclude with certainty that n is composite. So assume that an–1 ≡ 1 (mod n) and consider the powers bi := a2in (mod n) for i = 0, 1, . . . , r to see how the sequence b0, b1, . . . eventually reaches br ≡ 1 (mod n). If b0 ≡ 1 (mod n) already, this dynamics is clear. If, on the other hand, we have an i such that bi ≢ 1 (mod n), whereas bi+1 ≡ 1 (mod n), then bi is a square root of 1 modulo n. If n is a prime, the only square roots of 1 modulo n are ±1 and so n must be a strong pseudoprime to the base a. On the other hand, if n is composite but not the power of a prime, then 1 has at least two non-trivial square roots (that is, square roots other than ±1) modulo n (Exercise 3.30). We hope to find one such non-trivial square root of 1 in the sequence b0, b1, . . . , br–1 and if we are successful, the compositeness of n is proved with certainty.

A complete residue system modulo an odd composite n contains at most n/4 bases to which n is a strong pseudoprime. The proof of this fact is somewhat involved (though elementary) and can be found elsewhere, for example, in Chapter V of Koblitz [153]. Here, we concentrate on the Monte Carlo Algorithm 3.13 known as the Miller–Rabin primality test and based on this observation.

Algorithm 3.13. Miller–Rabin primality test

Input: An odd integer and an acceptable probability δ of failure.

Output: A certificate that either “n is composite” or “n is prime”.

Steps:

Find out n′ and r such that n – 1 = 2rn′ with  and n′ odd.
Determine the number t of iterations, so that the probability of failure is ≤ δ.
for (j = 1, . . . , t) {
   Choose a random base a, 1 < a < n.
   b := an  (mod n).   /* Compute b0 */
   if (b ≢ 1 (mod n)) {
      i := 0.
      while (i < r – 1) and (b ≢ –1 (mod n)) {
         i++, b := b2 (mod n).    /* Compute bi by squaring bi–1 */
         if (b ≡ 1 (mod n)) { Return “n is composite”. }
      }
      if (b ≢ –1 (mod n)) { Return “n is composite”. }
   }
}
Return “n is prime”.

Whenever Algorithm 3.13 outputs n is composite, it is correct. On the other hand, if it certifies n as prime, there is a probability δ that n is composite. This probability can be made very small by choosing a suitably large value of the iteration count t. For cryptographic applications, δ ≤ 1/280 is considered sufficiently safe. In view of the first statement of the last paragraph, we can take t = 40 to meet this error bound. In practice, much smaller values of t offer the desired confidence. For example, if n is of bit length 250, 500, 750 or 1000, the respective values t = 12, 6, 4 and 3 suffice.

Although, in Algorithm 3.13, we have chosen a to be an arbitrary integer between 2 and n – 2, there is apparently no harm, if we choose a randomly in the interval 2 ≤ a < 232. In fact, such a choice of single-precision bases is desirable, because that makes the exponentiation an (mod n) more efficient (See Algorithm 3.9). A typical cryptographic application loads at start-up a precalculated table of small primes (say, the first thousand primes). Choosing the bases randomly from this list of small primes is indeed a good idea.

Deterministic primality proving

While the Miller–Rabin algorithm settles the primality testing problem in a practical sense, it is, after all, a randomized algorithm. It is interesting, at the minimum theoretically, to investigate the deterministic complexity of primality testing. There has been a good amount of research in this line. Let us sketch here the history of deterministic primality proving, without going to rigorous mathematical details.

One natural strategy to check for the primality of a positive integer n is to factor it. However, factoring integers is a computationally difficult problem. Primality proving has been found to be a much easier computational exercise. That is, one need not factorize n explicitly in order to claim about the primality of n.

The (seemingly) first modern primality testing algorithm is due to Miller[204]. This algorithm is deterministic polynomial-time, provided that the extended Riemann hypothesis or ERH (Conjecture 2.3) is true. Since the ERH is still an unsolved problem in mathematics, it cannot be claimed with certainty if Miller’s test is really a polynomial-time algorithm. Rabin [248] provided a version of Miller’s test which is unconditionally polynomial-time, but is, at the same time, randomized. This is what we have discussed earlier under the name Miller–Rabin primality test. This is a Monte Carlo algorithm which produces the answer no (composite) with certainty, but the answer yes (prime) with some (small) probability of error. Solovay and Strassen’s test [287] based on Definition 3.3 is another no-biased randomized polynomial-time primality test and can be made deterministically polynomial-time under the ERH.

Adleman and Huang [3], using the work of Goldwasser and Kilian [116], provide a yes-biased randomized primality-proving algorithm that runs in expected polynomial time unconditionally. Adleman et al. [4] propose the first deterministic algorithm that runs unconditionally in time less than fully exponential (in log n). Its (worst) running time is (ln n)O(ln ln ln n), which is still not polynomial. (The exponent ln ln ln n grows very slowly with n, but still is not a constant.)

In August 2002, Agarwal, Kayal and Saxena came up with the first deterministic primality testing algorithm that runs in polynomial time unconditionally, that is, under no unproven assumptions. This algorithm, popularly abbreviated as the AKS algorithm, is based on the observation that n is prime if and only if (X + a)nXn + a (mod n) for every (Exercise 3.26). A naive application of this observation requires computing an exponential number of coefficients in the binomial expansion of (X + a)n. The AKS algorithm gets around with this difficulty by checking the new congruence

Equation 3.2


for some polynomial h(X) of small degree. Here the notation (mod n, h(X)) means modulo the ideal of . If deg h(X) is bounded by a polynomial in log n, (X + a)n (and also Xn + a) can be computed modulo n, h(X) in polynomial time. However, reduction modulo h(X) may allow a composite n to satisfy the new congruence. Agarwal et al. took h(X) := Xr –1 for some prime r = O(ln6 n) with r – 1 having a prime divisor ln n. From a result in analytic number theory due to Fouvry, such a prime r always exists. Congruence (3.2) is verified for this h(X) and for at most ln n values of a. An elementary proof presented in Agarwal et al. [5] demonstrates that this suffices to conclude deterministically and unconditionally about the primality of n. The AKS algorithm in this form runs in time O~(ln12 n).

Lenstra and Pomerance [175] have reduced the running time of the AKS algorithm to O~(ln6 n). The AKS paper comes with another conjecture which, if true, yields a O~(ln3 n) deterministic primality-proving algorithm.

Conjecture 3.1. AKS conjecture

Let n be an odd integer > 1, and with rn. If

(X – 1)nXn – 1 (mod n, Xr – 1),

then either n is prime or n2 ≡ 1 (mod r).

It remains an open question whether a future version of the AKS algorithm would supersede the Miller–Rabin test in terms of performance. As long as the answers are not favourable to the AKS algorithm, these new theoretical endeavours do not seem to have sufficient impacts on cryptography. Primes certified by the Miller–Rabin test are at present secure enough for all applications. Nonetheless, the AKS breakthrough has solid theoretical implications and deserves mention in a prime context.

3.4.2. Generating Random Primes

If a random prime of a given bit length t is called for, we can keep on generating random odd integers of bit length t and check these integers for primality using the Miller–Rabin test. The prime number Theorem 2.20 ascertains that after O(t) iterations we expect to find a prime. A somewhat similar but reasonably faster algorithm is discussed in Exercise 4.14. We will henceforth call random primes of a given bit length and having no additional imposed properties as naive primes. Naive primes are often not cryptographically secure, because the primes used in many protocols should satisfy certain properties in order to preclude some known cryptanalytic attacks.

Definition 3.5.

Let p be an odd prime. Then p is called a safe prime, if (p – 1)/2 is also a prime, whereas p is called a strong prime, if

  1. p – 1 has a large prime divisor, say, q,

  2. p + 1 has a large prime divisor, say, q′, and

  3. q – 1 has a large prime divisor, say, q″.

In cryptography, a large prime divisor typically refers to one with bit length ≥ 160.

A random safe prime of a given bit length t can be found by generating a random sequence of natural numbers n congruent to 3 modulo 4 and of bit length t, until one is found for which both n and (n – 1)/2 are primes (as certified by the Miller–Rabin primality test). The prime number theorem once again implies that this search is expected to terminate after O(t2) iterations.

For generating a random strong prime p of bit length t, we first generate q′ and q″ and then q and finally p. (See the notations of Definition 3.5.) Algorithm 3.14 describes Gordon’s algorithm in which the bit lengths l and l′ of q and q′ are nearly t/2 and the bit length l″ of q″ is slightly smaller than l′. In our concrete implementation of the algorithm, we choose l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20 and l″ := ⌈t/2⌉ – 22. If t is sufficiently large (say, t ≥ 400), the prime divisors q, q′ and q″ are then cryptographically large.

The simple check that Gordon’s algorithm correctly computes a strong prime of bit length t with q, q′ and q″ as in Definition 3.5 is based on Fermat’s little theorem and is left to the reader. Note that with our choice of l, l′ and l″, the loop variables i and j run through single-precision values only, thereby making arithmetic involving them efficient. Also note that the ranges over which i and j vary are sufficiently large so that we expect the (outer) while loop to be executed only once. This implementation has a tendency to generate smaller values of q and p (with the given bit sizes). In practice, this is not a serious problem and can be avoided, if desired, by choosing random values of i and j from the indicated ranges.

Algorithm 3.14. Gordon’s strong-prime generator

Input: , t ≥ 400.

Output: A strong prime p of bit length t.

Steps:

l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20, l″ := ⌈t/2⌉ – 22.

while (1) {
    Find a (random) naive prime q′ of bit length l′.
    Find a (random) naive prime q″ of bit length l″.
    for (i = ⌈(2l–1 – 1)/2q″⌉, . . . , ⌊(2l – 2)/2q″⌋) {                 /* Search for q */
       q := 2iq″ + 1.
       if (q is prime) {
          p′ := 2((q′)q – 2 mod q)q′ – 1.
          for (j = ⌈(2t–1 – p′)/2qq′⌉, . . . , ⌊(2t – 1 – p′)/2qq′⌋) {     /* Search for p */
             p := p′ + 2jqq′.
             if (p is prime) { Return }
          }
       }
    }
}

Gordon’s algorithm takes only nominally more expected running time than that needed by the algorithm discussed at the beginning of Section 3.4.2 for generating naive primes of the same bit length. On the other hand, safe primes are much costlier to generate and may be avoided, unless the situation specifically demands their usage.

3.4.3. Modular Square Roots

Determination of square roots modulo a prime p is frequently needed in cryptographic applications. In this section, we assume that p is an odd prime and want to compute the square roots of , gcd(a, p) = 1, modulo p, provided that a is a quadratic residue modulo p, that is, if . Using the Jacobi symbol the value can be computed efficiently as Algorithm 3.15 suggests.

The correctness of Algorithm 3.15 follows from the properties of the Jacobi symbol (Proposition 2.22 and Theorem 2.19). The value of (–1)(b2–1)/8 is determined by the value of b modulo 8, that is, by the three least significant bits of b:

Similarly, (–1)(a – 1)(b – 1)/4 can be computed using only the second least significant bits of a and b as:

If , our next task is to compute with x2a (mod p). If one such x is found, the other square root of a modulo p is –xpx (mod p). If p ≡ 3 (mod 4) or p ≡ 5 (mod 8), we have explicit formulas for a square root x. The remaining case, namely p ≡ 1 (mod 8), is somewhat complicated. In this case, we use the probabilistic algorithm due to Tonelli and Shanks. The details are given in Algorithm 3.16. The explicit formulas for the first two cases are easy to verify. We now prove the correctness of the algorithm in the remaining case.

Algorithm 3.15. Computation of the Legendre symbol

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: The Legendre symbol .

Steps:

b := p, k := 1./* Initialize */

/* The Euclidean loop */

Since is cyclic and has order p – 1 = 2vq, the 2-Sylow subgroup G of has order 2v and is also cyclic. Let g be a generator of G. By Euler’s criterion, aq is a square in G and, therefore, aqge = 1 (in G) for some even integer e, 0 ≤ e < 2v, and xa(q + 1)/2ge/2 (mod p) is a square root of a modulo p.

A generator g of G can be obtained by choosing random elements b from and computing the Legendre symbol . It is easy to see that . Furthermore, bq is a generator of G if and only if . Finding a quadratic non-residue in is the probabilistic part of the algorithm. Since exactly half of the elements of are quadratic non-residues, one expects to find one after a few random trials. In order to make the exponentiation bq efficient, b should be chosen as single-precision integers. The while loop of the algorithm computes the multiplier ge/2 in x using O(v) iterations by successively locating the 1 bits of e starting from the least significant end.

To sum up, square roots modulo a prime can be computed in probabilistic polynomial time. Computing square roots modulo a composite integer n is, on the other hand, a very difficult problem, unless the complete factorization of n is known (see Section 4.2 and Exercise 3.29).

Exercise Set 3.4

3.19Let be odd and composite and suppose that there exists (at least) one with an–1 ≢ 1 (mod n). Show that bn–1 ≢ 1 (mod n) for at least half of the bases . [H]

Algorithm 3.16. Modular square root

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: A square root of a modulo p (if existent).

Steps:

if { Returna does not have a square root modulo p”. }

if (p ≡ 3 (mod 4)) { Return (mod p). }

if (p ≡ 5 (mod 8))
   if  { Return  (mod p) }
   else { Return  (mod p). }

/* The case p ≡ 1 (mod 8) */
v := v2(p – 1), q := (p – 1)/2v.    /* q is odd */
Find a random quadratic non-residue b modulo p and set g := bq (mod p).
x := a(q + 1)/2 (mod p).
Precompute a–1 (mod p).
while (1) {
   find the smallest  for which (x2a–1)2i ≡ 1 (mod p).
   if (i = 0) { Return x. }
   x := xg2vi–1 (mod p).
}

3.20Let be odd and composite.
  1. Show that there exists , such that (mod n). [H]

  2. Show that (mod n) for at least half of the bases . [H]

3.21Let be a Carmichael number, that is, a composite integer for which an–1 ≡ 1 (mod n) for all a coprime to n, that is, ordn(a)|(n – 1) for all . Prove that:
  1. (p – 1)|(n – 1) for every prime divisor p of n. [H]

  2. n is odd. [H]

  3. n is square-free. [H]

  4. n has at least three distinct prime divisors.

3.22
  1. Let be a square-free composite integer, such that (p – 1)|(n – 1) for every prime divisor p of n. Show that n is a Carmichael number.

  2. Demonstrate that 561 = 3 × 11 × 17; 2,821 = 7 × 13 × 31; and 172,081 = 7 × 13 × 31 × 61 are Carmichael numbers.

  3. Assume that for some the integers p1 := 6k + 1, p2 := 12k + 1 and p3 := 18k + 1 are prime. Prove that p1p2p3 is a Carmichael number.

  4. Deduce that 1,729 = 7 × 13 × 19 and 294,409 = 37 × 73 × 109 are Carmichael numbers.

3.23

Fermat’s test for prime numbers Let and let , , be the prime factorization of n – 1. Suppose that there exist integers a1, . . . , ar such that for each i we have (mod n) and (mod n). Show that n is prime.

3.24

Pépin’s test for Fermat numbers Show that the Fermat number n := 22k + 1 is prime if and only if 3(n – 1)/2 ≡ –1 (mod n).

3.25Write an algorithm that, given natural numbers t, l with l < t, outputs a (probable) prime p of bit length t such that p – 1 has a (probable) prime divisor q of bit length l.
3.26Let .
  1. Show that the ring is (canonically) isomorphic to the ring . In view of this, we write f(X) ≡ g(X) (mod n) to mean either that the coefficients of f are congruent modulo n to the respective coefficients of g or that the polynomials f(X) and g(X) are congruent modulo the principal ideal of generated by n.

  2. Prove that if n is a prime, then (X + a)nXn + a (mod n) for every .

  3. Prove that for composite n there exists , 1 < k < n, with . Deduce that in this case (X + a)nXn + a (mod n) for some .

  4. Let and let be the canonical image of h(X) in . Show that the ring is isomorphic to the ring .

3.27Modify Algorithm 3.15 to compute the (generalized) Jacobi symbol for odd and for arbitrary .
3.28A Implement the Chinese remainder theorem for integers, that is, write an algorithm that takes as input pairwise relatively prime moduli and integers for i = 1, . . . , r and that outputs with aai (mod ni) for all i = 1, . . . , r. [H]
3.29Let f(X) be a non-constant polynomial in .
  1. Let the congruence f(x) ≡ 0 (mod pe), , have a solution xa (mod pe). Show that if an integer a′ := a + kpe solves the congruence f(x) ≡ 0 (mod pe + 1), then k satisfies the congruence

    f′(a)k ≡ –f(a)/pe (mod p).

    Here f(a)/pe means integer division. Demonstrate that this congruence may have 0, 1 or p solutions (for k) depending on the values of f′(a) and f(a)/pe. Each such k gives a solution a′ of f(x) ≡ 0 (mod pe + 1) with a′ ≡ a (mod pe). We say that the solution a′ (modulo pe + 1) is obtained from the solution a (modulo pe) by (Hensel) lifting.

  2. Lifting together with the Chinese remainder theorem allow us to reduce the problem of solving a polynomial congruence modulo an arbitrary modulus to the problem of solving the same congruence modulo the prime divisors of n. More precisely, if the prime factorization of n and all the solutions of the congruences f(x) ≡ 0 (mod pi) for all i = 1, . . . , r are given, design an algorithm to compute all the solutions of the congruence f(x) ≡ 0 (mod n).

3.30Let be odd and . Deduce that the congruence x2a (mod n) has exactly solutions modulo n.
3.31Show that Algorithm 3.17 correctly computes for . Specify a strategy to initialize a before the while loop. Determine how Algorithm 3.17 can be used to check if a given is a perfect square. [H]
Algorithm 3.17. Integer square root

Input: .

Output: .

Steps:

Using bit operations initialize a to an integral value x.
while (1) {    /* Newton’s iteration loop */
   b := ⌊(a + ⌊n/a⌋)/2⌋.
   if (a ≤ b) { Return a. }
   a := b.
}

3.32
  1. Design an algorithm that, given n, , computes . [H]

  2. Design an algorithm to check if a given is an integral power of another integer.

3.5. Arithmetic in Finite Fields

Many cryptographic protocols are based on the (apparent) intractability of the discrete logarithm problem (Section 4.2) in the multiplicative group of a finite field . The arithmetic of the finite fields , , and , , is easy to implement and run efficiently. In view of this, these two kinds of finite fields are most popular in cryptography and we concentrate our algorithmic study on these fields only.

A prime field is the quotient ring . In Section 3.3.4, we have already made a thorough study of the arithmetic of the rings , . We recall that the elements of are represented as integers from the set {0, 1, . . . , p – 1} and the arithmetic in is the modulo p integer arithmetic. Since p is typically multiple-precision, the characteristic p of is odd. The fields of even characteristic that we will study are the non-prime fields .

Section 2.9.3 explains several representations of extension fields. The most common one is the polynomial-basis representation for an irreducible polynomial f(X) of degree n in . In that case, an element of has the canonical representation as a polynomial a0 + a1X + · · · + an–1Xn–1, , of degree < n. An arithmetic operation on two elements of is the same operation in followed by reduction modulo the defining polynomial f(X). So we start with the implementation of the polynomial arithmetic over .

3.5.1. Arithmetic in the Ring

A polynomial over (or any field) is identified by its coefficients of which only finitely many are non-zero. Thus for storing a polynomial g(X) = adXd + ad–1Xd–1 + · · · + a1X + a0 it is sufficient to store the finite ordered sequence adad–1 . . . a1a0. It is not necessary to demand ad ≠ 0, but the shortest sequence representing a non-zero polynomial corresponds to ad ≠ 0 and in this case deg g = d. On the other hand, as we see later it is often useful to pad such a sequence with leading zero coefficients. As an example, the polynomial is representable as 101 or as 0101 or as 00101 or · · ·.

Since can be viewed as the set {0, 1} with operations modulo 2, a polynomial in is essentially a bit string unique up to insertion (and deletion) of leading zero bits. As in the case of multiple-precision integers, we pack these coefficients in an array of 32-bit words and maintain the number of coefficients belonging to the polynomial. For example, the polynomial g(X) = X64 + X31 + X7 + 1 can be stored in an array w2w1w0 of three 32-bit words. w0 consists of the coefficients of X0, X1, . . . , X31, w1 consists of the coefficients of X32, X33, . . . , X63, and w2 consists of the coefficient of X64. It is up to the implementation scheme to decide whether the coefficients are to be stored from left to right or from right to left in the bits of a word. We assume that less significant coefficients go to the less significant bits of a word. For the polynomial g above, the word w0 viewed as an unsigned integer will then be w0 = 231 + 27 + 1, whereas we have w1 = 0. The least significant bit of w2 would be 1. The remaining 31 bits of w2 are not important and can be assigned any value as long as we maintain the information that only the coefficients of Xi, 0 ≤ i ≤ 64, need to be considered. On the other hand, if we want to store the coefficients of g upto that of X80, then the bits of w2 at locations 1, . . . , 16 must be zero, whereas those at locations 17, . . . , 31 may be of any value. We, however, always recommend the use of leading zero-bits to fill the portion of the leading word not belonging to the polynomial.

Such a representation of elements of , in addition to being compact, facilitates efficient implementation of arithmetic functions. As we will shortly see, we need not often extract the individual coefficients of a polynomial but apply bit operations on entire words to process 32 coefficients simultaneously per operation. We usually do not need polynomials of degrees > 4096 for cryptographic applications. It is, therefore, sufficient to declare a static array capable of storing all the 8193 coefficients of a product of two such largest polynomials. The zero polynomial may be represented as one with zero word size, whereas the degree of the zero polynomial is taken to be –∞ which may be representable as –1.

We now describe the arithmetic functions on two non-zero polynomials

Equation 3.3


Under our implementation, a and b demand ρ := ⌈(r + 1)/32⌉ and σ := ⌈(s + 1)/32⌉ machine words αρ – 1 . . . α1α0 and βσ – 1 . . . β1β0. We also assume paddings with leading zero bits in the areas not belonging to the operands.

Note that the addition of is the same as the XOR (⊕) of two bits. Applying this bit operation on words αi and βi adds 32 coefficients of the operand polynomials simultaneously (see Algorithm 3.18). Finally note that –1 = 1 in any field of characteristic 2, that is, subtraction is the same as addition in such a field.

The product a(X)b(X) can be computed as in Algorithm 3.19. Once again, using wordwise operations yields faster implementation. By AND and OR, we denote the bit-wise and and or operations on 32-bit words. The easy verification of the correctness of this algorithm is left to the reader. As in the case of addition, one might want to make the polynomial c compact after its words γτ – 1, . . . , γ0 are computed.

Algorithm 3.18. Polynomial addition

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) + b(X) (to be stored in the array γτ – 1 . . . γ1γ0).

Steps:

τ := max(ρ, σ).
for (i = 0, . . . , min(ρ, σ) – 1) γi := αi ⊕ βi.
if (ρ > σ) for (i = σ, . . . , ρ – 1) γi := αi,
else if (ρ < σ) for (i = ρ, . . . , σ – 1) γi := βi.
while (τ > 0) and (γτ – 1 = 0) τ – –.       /* Make c compact (optional) */

Algorithm 3.19. Polynomial multiplication

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X)b(X) (to be stored in the array γτ – 1 . . . γ1γ0).

Steps:

τ := ρ + σ – 1.     /* The size of the product */
for (i = 0, . . . , τ – 1) γi := 0.     /* Initialize the product */

/* The quadratic multiplication loop */
for (k = 0, . . . , 31) {    /* For each bit position in a word */
   for (j = 0, . . . , σ – 1) {     /* For each word of b */
      if (bj AND 2k) {     /* if the k-th bit of bj is 1 */
         for (i = 0, . . . , ρ – 1) {    /* For each word of a */
            set γi + j := γi + j ⊕ (ai ≪ kand γi + j + 1 := γi + j + 1 ⊕ (ai ≫ (32 – k)).
         }
      }
   }
}

The square of can be computed very easily using the fact that

a(X)2 = (arXr + · · · + a1X + a0)2 = arX2r + · · · + a1X2 + a0.

This gives us a linear-time (in terms of r or ρ) algorithm instead of the quadratic general-purpose multiplication Algorithm 3.19. We leave the implementational details to the reader.

Division with remainder in is implemented in Algorithm 3.20. As before, we continue to work with the operands a(X) and b(X) as in Equation (3.3). But now we make a further assumption that bs = 1, so that βσ–1 ≠ 0, and also that sr. When the Euclidean division loop of Algorithm 3.20 terminates, the array locations δσ–1, . . . , δ1, δ0 contain the remainder. The arrays γ and δ may be made compact to discard the leading zero bits, if any.

Algorithm 3.20. Euclidean division of polynomials

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) quot b(X) (to be stored in the array γτ – 1 . . . γ1γ0) and d(X) = a(X) rem b(X) (to be sored in the array δρ–1 . . . δ1δ0).

Steps:

τ := ⌈(s – r + 1)/32⌉.    /* The size of the quotient */
for i = 0, . . . , τ – 1 { γi := 0 }    /* Initialize c(Xto 0 */

for i = 0, . . . , ρ – 1 { δi := αi }   /* Copy a(Xto d(X*/

/* Euclidean division loop */
for i = r, . . . , s {
   if (the coefficient of Xi in d(Xis 1) {
       j := (i – s) quot 32, k := (i – s) rem 32.

       /* Set the coefficient of Xis of c(X*/
       γj := γj OR 2k.

       /* Update d(X) := d(X) – Xisb(X*/
       for l = 0, . . . , σ – 1 {
          δl + j := δl + j ⊕ (bl ≪ k).
          δl + j + 1 := δl + j + 1 ⊕ (bl ≫ (32 – k)).
       }
    }
}

Computing modular inverses requires computation of extended gcds of polynomials in . We again start with the non-zero polynomials a(X), and compute polynomials d(X), u(X) and v(X) in with d(X) = gcd(a(X), b(X)) = u(X)a(X) + v(X)b(X), deg u < deg b and deg v < deg a. For polynomials, we do not have an equivalent of the binary gcd algorithm (Algorithm 3.8). We use repeated Euclidean divisions instead.

The proof for the correctness of Algorithm 3.21 is similar to that for Algorithm 3.8. Here, we introduce the variables rk, Uk and Vk for k = 0, 1, 2, . . . . The initialization goes as: r0 := a, r1 := b, U0 := 1, U1 := 0, V0 := 0 and V1 := 1. During the k-th iteration (k = 1, 2, . . .), we first use Euclidean division to get rk–1 = qkrk + rk + 1 which gives rk + 1 = rk–1qkrk. We also compute Uk + 1 = Uk–1qkUk and Vk + 1 = Vk–1qkVk using the values available from the previous two iterations so as to maintain the relation rk + 1 = Uk + 1r0 + Vk + 1r1 for all k = 1, 2, . . . . In Algorithm 3.21, the k-th iteration of the while loop begins with x = rk–1, y = rk, u1 = Uk and u2 = Uk–1 and ends after updating the values to x = rk, y = rk + 1, u1 = Uk + 1 and u2 = Uk. It is not necessary to maintain the values Vk in the main loop. After the loop terminates, one computes Vk = (rkUkr0)/r1.

Modular arithmetic in is very much similar to the modular arithmetic in . If f(X) is a non-constant polynomial of (not necessarily irreducible), we represent elements of as polynomials in of degrees < n. Given two such polynomials a and b, we compute the sum a + b simply as the sum in . The product ab is computed by first computing the product ab in and then computing the remainder of Euclidean division of this product by f. Inverse of a modulo f exists if and only if gcd(a, f) = 1 (in ). In that case, extended gcd computation gives us polynomials u, v such that 1 = ua + vf, so that ua ≡ 1 (mod f). If a ≠ 0, then Algorithm 3.21 computes u with deg u < deg f = n, so that we take this u to be the canonical representative of a–1 in . Finally, for the computation of the modular exponentiation ae (mod f) can be done using an algorithm very similar to Algorithm 3.9 or Algorithm 3.10. We leave the details to the reader.

Algorithm 3.21. Extended gcd of polynomials

Input: Nonzero polynomials a, .

Output: Polynomials d, u, satisfying

d = gcd(a, b) = ua + vb, deg u < deg b, deg v < deg a.

Steps:

/* Initialize */
x := ay := bu1 := 1, u2 := 0.

/* Repeated Euclidean division */
while (y ≠ 0) {
   Simultaneously compute q := x quot y and r := x rem y (Algorithm 3.20).
   u := u2 – qu1u2 := u1u1 := u,
   x := yy := r.
}
d := xv := (d – ua)/b.

3.5.2. Finite Fields of Characteristic 2

For the polynomial basis representation , we need an irreducible polynomial of degree n. We shortly present a probabilistic algorithm that generates a random monic irreducible polynomial in of given degree . Although we are interested only in the case q = 2, this algorithm holds even if q is any arbitrary prime or an arbitrary prime power.

First, we describe a deterministic polynomial-time algorithm for checking the irreducibility of a non-constant polynomial (over ). If f is reducible, it has a factor of degree i ≤ ⌊n/2⌋. Also recall (Theorem 2.40, p 82) that XqiX is the product of all monic irreducible polynomials of of degrees dividing i. Therefore, if f has an irreducible factor of degree i, then gcd(f, XqiX) = gcd(f, XqiX rem f) will be a non-constant polynomial. Algorithm 3.22 employs these simple observations.

Now, recall from Section 2.9.2 that a random monic polynomial of of degree n is irreducible with probability approximately 1/n. Therefore, if we keep on checking for irreducibility random monic polynomials in of degree n, then after O(n) checks we expect to find an irreducible polynomial. This leads to the Las Vegas probabilistic Algorithm 3.23.

Algorithm 3.22. Check for irreducibility of a polynomial

Input: A non-constant polynomial .

Output: A (deterministic) certificate whether f is irreducible or not.

Steps:

n := deg fg := X.
for i = 1, . . . , ⌊n/2⌋ {
   g := gq (mod f).   /* Here g = Xqi rem f */
   if (deg(gcd(fg – X)) > 0) { Return “f is reducible”. }
}
Return “f is irreducible”.

Algorithm 3.23. Generation of a random irreducible polynomial

Input: , n ≥ 2.

Output: A random monic irreducible polynomial of degree n.

Steps:

while (1) {
   f := a random monic polynomial in  of degree n.
   if (f is irreducible) { Return }
}

Once the defining irreducible polynomial f is available, we carry out the arithmetic in as modular polynomial arithmetic with respect to the modulus f. This is described at the end of Section 3.5.1. Since this modular arithmetic involves taking the remainder of Euclidean division by f, it is sometimes expedient to choose f to be an irreducible polynomial of certain special types. The randomized algorithm described above gives a random monic irreducible polynomial f of degree n having on an average ≈ n/2 non-zero coefficients. The division algorithm (Algorithm 3.20) in that case takes time O(n2). On the other hand, if f is a sparse polynomial (like a trinomial), the Euclidean division loop can be rewritten to exploit this sparsity, thereby bringing down the running time of the division procedure to O(n). (See Exercise 3.34. Also see Exercise 3.38 for computing isomorphisms between different polynomial-basis representations of the same field.)

Let p be a prime and let . We have seen how to implement arithmetic in and hence by Exercise 3.35 that in too. If is an irreducible polynomial of degree n and if q = pn, then and we implement the arithmetic of as the polynomial arithmetic of modulo f. Again by Exercise 3.35, this gives us the arithmetic of . Now, for and a monic irreducible polynomial we have a representation . Instead of having such a two-way representation of we may also represent as , where is a monic irreducible polynomial of degree nm. It usually turns out that the second representation of is more efficient. However, there are some situations where the two-way representation performs better. This is, in particular, the case when the arithmetic of can be made more efficient than the modular polynomial arithmetic of . For example, we might precompute tables of arithmetic operations of and use table lookups for performing the coefficient arithmetic of . This demands O(q2) storage and is feasible only when q is small. On the other hand, if we find a primitive element γ of and precompute a table that maps i ↦ γi and another that maps γii, then products in can be computed in time O(1) using table lookups. If, in addition, we store the Zech’s logarithm table (Section 2.9.3) for , then addition in can also be performed in O(1) time with table lookup. Both these three tables take O(q) memory which (though better than O(q2) storage of the previous scheme) is feasible only for small q.

3.5.3. Selecting Suitable Finite Fields

Not all finite fields are suitable for cryptographic applications. In this section, we discuss the desirable properties of a field so that secured protocols on can be developed. We first note that such protocols are usually based on the apparent intractability of the so-called discrete logarithm problem (DLP) (Section 4.2). As a result, selections of suitable fields are dictated by the known cryptanalytic algorithms to solve the DLP (See Section 4.4). We shall mostly concentrate on with either q = p a prime or q = 2n for some . By the bit size of q, denoted |q|, we mean the number of bits in the binary representation of q, that is, |q| = ⌈lg q⌉. As we have seen, each element of is representable using O(|q|) bits and, therefore, |q| is often also called the size of .

The first requirement on a cryptographically suitable field is that the size |q| should be sufficiently large. Recent cryptanalytic studies show that sizes |q| ≤ 512 are not secure enough. Sizes |q| ≥ 768 are recommended for secure applications. For long-term security, one might even require |q| ≥ 2048.

Any field of the recommended size is, however, not adequately secure. The cardinality #Fq = q must be such that q – 1 has at least one large prime divisor q′ (See the Pohlig–Hellman method in Section 4.4). By large, we usually mean |q′| ≥ 160. In addition, this prime factor q′ of q – 1 should be known to us. If q = p is a prime, then a safe prime or a strong prime serves our purpose (Definition 3.5, Algorithm 3.14). Also see Exercise 3.25. On the other hand, if q = 2n, the only way to obtain q′ is by factorizing the Mersenne number Mn := q – 1 = 2n – 1. Factorizing Mn for n ≥ 768 is a very difficult task. Luckily, extensive tables of complete or partial factorizations of Mn are available. For example, for n = 769 (a prime number), we have

M769 = 2769 – 1 = 1,591,805,393 × 6,123,566,623,856,435,977,170,641 × q′,

where q′ is a 657-bit prime. These tables should be consulted for choosing a suitable value of n.

The multiplicative group is cyclic (Theorem 2.38). If the complete integer factorization of q – 1 is known, then it is possible to find, in polynomial time (in |q|), a primitive element of . Algorithm 3.24 computes r = O(lg n) exponentiations in G in order to conclude whether a given element is a generator of G. For , we have polynomial-time exponentiation algorithms, so Algorithm 3.24 runs in deterministic polynomial time. By Exercise 2.47, the probability of a randomly chosen element of G being primitive is φ(m)/m. In view of the lower bound on φ(m)/m, given in Theorem 3.1 and proved by Rosser and Schoenfield [253], Algorithm 3.25 is expected to return a random primitive element of G after O(ln ln m) iterations.

Theorem 3.1.

Let , m ≥ 5. Then φ(m)/m ≥ 1/(6 ln ln m).

Algorithm 3.24. Check for primitive element

Input: A cyclic group G of cardinality #G = m with known factorization and an element .

Output: A deterministic certificate that a is a generator of G.

Steps:

/* We assume that G is multiplicatively written and has the identity e */
for i = 1, . . . , r {
   if (a(n–1)/pi = e) { Return “a is not a generator of G”. }
}
Return “a is a generator of G”.

Algorithm 3.25. Computation of a generator of a finite cyclic group

Input: A cyclic group G of cardinality #G = m with known factorization .

Output: A generator g of G.

Steps:

while (1) {
    g := a random element of G.
    if (g is a generator of G) /* Algorithm 3.24 */ { Return }
}

If, however, the factorization of #G = m is not known, there are no known (deterministic or probabilistic) algorithms for finding a random generator of G or even for checking if a given element of G is primitive. This is indeed one of the intractable problems of computational algebraic number theory. This problem for can be bypassed as follows.

Recall that we have chosen q in such a way that has a large known prime factor q′. Let H be the unique subgroup of G of order q′. Then H is also cyclic and we choose to work in H (using the arithmetic of G). It turns out that if q′ ≥ 2160 and if H is not contained in a proper subfield of , the security of cryptographic protocols over does not degrade too much by the use of H (instead of the full G) as the ground group. But we now face a new problem, that is, the problem of finding a generator of H. Since #H = q′ is a prime, every element of H \ {1} is a generator of H. So the problem essentially reduces to that of finding any non-identity element of H. This latter problem has a simple probabilistic solution. First of all, if q – 1 = q′ is itself prime, choosing any random non-identity element of will do. So assume q′ < q – 1. Choose a random and let b := a(q – 1)/q. By Lagrange’s theorem (Theorem 2.2, p 24), bq = aq–1 = 1 and, therefore, by Proposition 2.5 . Now, being a field, the polynomial can have at most (q – 1)/q′ roots in (that is, in ) and hence the probability that b = 1 is ≤ ((q – 1)/q′)/(q – 1) = 1/q′. This justifies the randomized polynomial running time of the Las Vegas Algorithm 3.26. Indeed if q′ ≥ 2160, the while loop of the algorithm is executed only once almost always.

Algorithm 3.26. Computation of an element of given order

Input: A finite field and an (odd) prime factor q′ of q – 1 with q′ < q – 1.

Output: An element of multiplicative order q′.

Steps:

while (1) {
   a := a random element of  \ {0, ±1}.
   b := a(q – 1)/q.
   if (b ≠ 1) { Return }
}

3.5.4. Factoring Polynomials over Finite Fields

Polynomial factorization over finite fields is an interesting computational problem. All deterministic algorithms known for this purpose are quite poor: that is, fully exponential in the size of the field. However, if randomization is allowed, we have reasonably efficient (polynomial-time) algorithms. In this section, we outline the basic working of the modern probabilistic algorithms for polynomial factorization over finite fields. We assume that a non-constant polynomial is to be factored. Without loss of generality, we can take f to be monic. We assume further that the arithmetic of and that of is available. We work with a general value of q = pn, p prime and , though in some cases we have to treat the case p = 2 separately. Irreducibility (or otherwise) in this section means the same over .

The factorization algorithm we are going to discuss is a generalization of the root finding algorithm (see Exercise 3.36) and consists of three steps:

Square-free factorization (SFF) Decompose the input polynomial f into a product of square-free polynomials.

Distinct-degree factorization (DDF) Given a square-free polynomial f of degree d, compute f = f1 . . . fd with each fi being a product of irreducible polynomials of degree i.

Equal-degree factorization (EDF) Given a product f of irreducible polynomials of the same degree, find out the irreducible factors of f.

We now provide a separate detailed discussion for each of these three steps.

Square-free factorization

Theorem 3.2 is at the very heart of the square-free factorization algorithm and is a generalization of Exercise 2.61.

Theorem 3.2.

Let K be a field and a non-constant monic polynomial. Then the polynomial f / gcd(f, f′) is square-free, where f′ is the formal derivative of f. In particular, f is square-free if and only if gcd(f, f′) = 1.

Proof

Let be the factorization of f with pairwise distinct monic irreducible polynomials f1, . . . , fr, , with and with . In order to determine vf1(f′), we employ the usual rules for derivatives to get for some . If , then vf1(f′) ≥ α1. Otherwise, vf1(f′) = α1 – 1, since , i > 1. Similar is the case for vfi(f′) for i = 2, . . . , r. It follows that gcd, where each , so that , , is square-free.

The algorithm for SFF over is now almost immediate except for one subtlety, namely, the consideration of the case f/gcd(f, f′) = 1, or equivalently, f′ = 0. In order to see when this case can occur, let us write the non-zero terms of f as f = a1Xe1 + · · · + atXet with distinct exponents e1, . . . , et and . Then f′ = a1e1Xe1 – 1 + · · · + atetXet – 1 = 0 if and only if , that is, if p divides all of e1, . . . , et. But then f(X) = h(X)p, where , since for all i. These observations motivate the recursive Algorithm 3.27. It is easy to check that this (deterministic) algorithm runs in time polynomially bounded by deg f and log q.

Algorithm 3.27. Square-free factorization

Input: A monic non-constant polynomial , q = pn, p prime, .

Output: A square-free factorization of f.

Steps:

Compute f′.
if (f′ = 0) {
    Compute  such that f = hp.
    Recursively compute a SFF h = h1 · · · hs of h.
    Return the SFF of f as f = (h1 · · · hs)(h1 · · · hs) · · · (h1 · · · hs(p times).
else {
    Recursively compute a SFF gcd(ff′) = g1 · · · gs of gcd(ff′).
    Return the SFF of f as f = (f/ gcd(ff′))g1 · · · gs.
}

Distinct-degree factorization

Let be a square-free polynomial of degree d. We can write f = f1 · · · fd, where for each i the polynomial is the product of all the irreducible factors of f of degree i. If f does not have an irreducible factor of degree i, then we take fi = 1 as usual.[5] In order to compute the polynomials fi, we make use of the fact that is the product of all monic irreducible polynomials in whose degrees divide i (see Theorem 2.40 on p 82). It immediately follows that . Thus a few (at most d) gcd computations give us all fi. But the polynomials are of rather large degrees. But since , keeping polynomials reduced modulo f implies that we take gcds of polynomials of degrees ≤ d. This, in turn, implies that the DDF can be performed in (deterministic) polynomial time (in d and ln q).

[5] Conventionally, an empty product is taken to be the multiplicative identity and an empty sum to be the additive identity.

Algorithm 3.28 shows an implementation of the DDF. Though the algorithm does not require f to be monic, there is no harm in assuming so.

Algorithm 3.28. Distinct-degree factorization

Input: A (non-constant) square-free polynomial .

Output: The DDF of f, that is, the polynomials f1, . . . , fd as explained above.

Steps:

g := f.   /* Make a local copy of f */
h = Xi = 1.
while (deg g ≠ 0) {
   h := hq (mod f).   /* Modular exponentiation */
   fi := gcd(h – Xg).
   g := g/fi.    /* Factor out fi from g */
   i++.
}
if (i < d) { fi + 1 := 1, . . . , fd := 1. }

This simple-minded implementation of the DDF is theoretically not the most efficient one known. In fact, it turns out that the DDF (and not the seemingly more complicated EDF) is the bottleneck of the entire polynomial factorization process. Therefore, making the DDF more efficient is important and there are lots of improvements suggested in the literature. All these improved algorithms essentially do the same thing as above (that is, the computation of ), but they optimize the computation of the polynomials rem f. The best-known method (due to Kaltofen and Shoup) is based on the observation that, in general, most of the fi are 1. Therefore, instead of computing each , one may break the interval 1, . . . , d into several subintervals I1, I2, . . . , Il and compute , j = 1, . . . , l. Only those Fj that turn up to be non-constant are further decomposed.

For cryptographic purposes, we will, however, deal with rather small values of d = deg f. (Typically d is at most a few thousands.) The asymptotically better algorithms usually do not outperform the simple Algorithm 3.28 for these values of d.

Equal-degree factorization

Equal-degree factorization, the last step of the polynomial factorization process, is the only probabilistic part of the algorithm. We may assume that f is a (monic) square-free polynomial of degree d and that each irreducible factor of f has the same (known) degree, say δ. If d = δ, then f is irreducible. So we assume that d > δ, that is, d = rδ for some . Theorem 3.3 provides the basic foundations for the EDF.

Theorem 3.3.

Let g be any polynomial in and let . Then XqδX divides gqδg.

Proof

If g = 0, there is nothing to prove. If g = alXl + · · · + a1X + a0 ≠ 0 with , then gqδg = al(XlqδXl) + · · · + a1(XqδX). It is easy to verify that XqδX divides XiqδXi for every .

Now, we have to separate two cases, namely, q is odd and q is even. Theorem 3.3 is valid for any q, even or odd, but taking q as odd allows us to write gqδg = g(g(qδ –1)/2–1)(g(qδ –1)/2 + 1). With the above assumptions on f we have f|(XqδX) and, therefore, f|(gqδg), so that f = gcd(gqδg, f) = gcd(g, f) gcd(g(qδ –1)/2 – 1, f) gcd(g(qδ –1)/2 + 1, f). If g is randomly chosen, then gcd(g(qδ –1)/2 – 1, f) is with probability ≈ 1/2 a non-trivial factor of f. The idea is, therefore, to keep on choosing random g and computing until one gets . One then recursively applies the algorithm to and . It is sufficient to choose g with deg g < 2δ. Obviously, the exponentiation gqδ has to be carried out modulo f. We leave the details to the reader, but note that trying O(1) random polynomials g is expected to split f and, therefore, the EDF runs in expected polynomial time.

For the case q = 2n, essentially the same algorithm works, but we have to use the split gqδ + g = g2nδ + g = (g2nδ–1 + g2nδ–2 + · · · + g2 + g)(g2nδ–1 + g2nδ–2 + · · · + g2 + g + 1). Once again computing gcd(g2nδ–1 + g2nδ–2 + · · · + g2 + g, f) for a random splits f with probability ≈ 1/2 and, thus, we get an EDF algorithm that runs in expected polynomial time.

Exercise Set 3.5

3.33Find a (polynomial-basis) representation of . Compute a primitive element in this representation.
3.34
  1. Show that the running time of Algorithm 3.20 is O(s(rs)) which reaches the maximum order of O(r2) = O(s2), when sr/2.

  2. Suppose b is known to have e non-zero coefficients. Modify the Euclidean division loop of Algorithm 3.20 so that the algorithm runs in time O((rs)e). [H] In particular, if e = O(1), the running time of Algorithm 3.20 becomes linear, namely O(r).

3.35Implement the polynomial arithmetic of given that of .
3.36Let q = pn (p prime and ), a non-constant polynomial and let g := gcd(f, XqX).
  1. If S is the set of all roots of f in , show that . Thus, g is a square-free polynomial which splits over and has the same roots (over ) as f. If deg g = 0 or 1, then we know all the roots of g and hence of f. So, for the rest of this exercise, we assume that deg g ≥ 2.

  2. Consider the case that p is odd. Let be arbitrary. Show that

    (X + b)((X + b)(q–1)/2 – 1)((X + b)(q–1)/2 + 1) = XqX

    and that

    g = gcd(g, X + b) gcd(g, (X + b)(q–1)/2 – 1) gcd(g, (X + b)(q–1)/2 + 1).

    Explain how Algorithm 3.29 produces two non-trivial factors of g (over ) in probabilistic polynomial time. [H] Write an algorithm to compute all the roots of f in .

    Algorithm 3.29. Computing roots of a polynomial: odd characteristic

    Input: A square-free polynomial that splits over .

    Output: Polynomials g1, with g = g1g2 and deg gi ≥ 1 for i = 1, 2.

    Steps:

    if (g(0) = 0) { (g1g2) := (Xg(X)/X), return. }
    while (1) {
      Select a random element .
      h := (X + b)(q–1)/2 – 1 (mod g).
      g1 := gcd(gh).
      if (1 ≤ deg g1 < deg g) { g2 := g/g1return. }
    }

  3. Now, assume that p = 2 and define the polynomial

    Let be arbitrary. Show that

    H(X + b)(H(X + b) + 1) = XqX

    [H] and that

    g(X) = gcd(g(X), H(X + b)) gcd(g(X), H(X + b) + 1).

Explain how Algorithm 3.30 produces two non-trivial factors of g (over ) in probabilistic polynomial time. Write an algorithm to compute all the roots of f in .

Algorithm 3.30. Computing roots of a polynomial: characteristic 2

Input: A square-free polynomial that splits over .

Output: Polynomials g1, with g = g1g2 and deg gi ≥ 1 for i = 1, 2.

Steps:

if (g(0) = 0) { (g1g2) := (Xg(X)/X), return. }
while (1) {
   Select a random element .
   h := (X + b) + (X + b)2 + (X + b)4 + · · · + (X + b)2n–1 (mod g).
   g1 := gcd(gh).
   if (1 ≤ deg g1 < deg g) { g2 := g/g1return. }
}

3.37Use Exercise 3.36 to compute all the roots of the following polynomials:
  1. X6 + 6X4 + 4X2 + 6 in .

  2. X3 + (α2 + α)X2 + (α2 + α + 1) in , where is represented as , α being a root of the polynomial X3 + X + 1.

3.38Let f and g be two monic irreducible polynomials over and of the same degree . Consider the two representations . In this exercise, we study how we can compute an isomorphism between these two representations. The polynomial f(Y) splits into linear factors over . Consider a root α = α(Y) of f(Y) in . Show that 1, α, α2, . . . , αn–1 is an -basis of (the -vector space) . For i = 0, . . . , n – 1, write (uniquely) with , and consider the matrix A = (αij)0≤in–1, 0≤jn–1. Show that the map that maps (the equivalence class of) a0 + a1X + · · · + an–1Xn–1 to (the equivalence class of) b0 + b1Y + · · · + bn–1Yn–1, where (b0b1 . . . bn–1) = (a0a1 . . . an–1)A, is an -isomorphism.
3.39Let q = pn for a prime p and . We have seen that the elements of can be represented as integers between 0 and p – 1, whereas the elements of can be represented as polynomials modulo some irreducible polynomial of degree n, that is, as polynomials of of degrees < n. Show that the substitution X = p in the polynomial representation of elements of gives a representation of elements of as integers between 0 and q – 1. We call this latter representation of elements of the packed representation. Compare the advantages and disadvantages of the packed representation over the polynomial representation.
3.40Let G be a cyclic multiplicatively written group of order m (and with the identity element e). Assume that the factorization of is known. Devise an algorithm that computes the order of an arbitrary element in G. [H]
3.41

Berlekamp’s Q-matrix factorization Let be a monic square-free polynomial of degree d, that admits a factorization f(X) = f1(X) . . . fr(X) with each monic, non-constant and irreducible. (Note that fi are pairwise distinct, since f is square-free.) Let di be the degree of fi.

  1. Consider the ring

    Show that . [H] A is an -vector space of dimension d.

  2. Consider the map that maps x = X + 〈f(X)〉 to xqx. Show that is an -linear transformation with Ker , and so the nullity of equals the number of irreducible factors of f.

  3. Let Q be the matrix of with respect to the basis 1, x, . . . , xd–1. Describe an algorithm to compute Q. Also design an algorithm to compute a basis of Ker .

  4. Show that if , then

    For a suitable h(X), this is a non-trivial factorization of f. This procedure is efficient, when q is small.

  5. Use Berlekamp’s method to factor X6 + X5 + X2 + 1 over .

*3.6. Arithmetic on Elliptic Curves

The recent popularity of cryptographic systems based on elliptic curve groups over stems from two considerations. First, discrete logarithms in can be computed in subexponential time. This demands q to be sufficiently large, typically of length 768 bits or more. On the other hand, if the elliptic curve E over is carefully chosen, the only known algorithms for solving the discrete logarithm problem in are fully exponential in lg q. As a result, smaller values of q suffice to achieve the desired level of security. In practice, the length of q is required to be between 160 and 400 bits. This leads to smaller key sizes for elliptic curve cryptosystems. The second advantage of using elliptic curves is that for a given prime power q, there is only one group , whereas there are many elliptic curve groups (over the same field ) with orders ranging from to . If a particular group is compromised, we can switch to another curve without changing the base field .

In this section, we start with the description of efficient implementation of the arithmetic in the groups . Then we concentrate on some algorithms for counting the order . Knowledge of this order is necessary to find out cryptographically suitable elliptic curves. We consider only prime fields or fields of characteristic 2. So we assume that the curve is defined by Equation (2.8) or Equation (2.9) on p 100 (supersingular curves are not used in cryptography) instead of by the general Weierstrass Equation (2.6) on p 98.

3.6.1. Point Arithmetic

Let us first see how we can efficiently represent points on an elliptic curve E over . Since corresponds to two elements h, and since each element of can be represented using ≤ s = ⌈lg q⌉ bits, 2s bits suffice to represent P. We can do better than this. Substituting X = h in the equation for E leaves us with a quadratic equation in Y. This equation has two roots of which k is one. If we adopt a convention (for example, see Section 6.2.1) that identifies, using a single bit, which of the two roots the coordinate k is, the storage requirement for P drops to s + 1 bits. During an on-line computation this compressed representation incurs some overhead and may be avoided. However, for off-line storage and transmission (of public keys, for example), this compression may be helpful.

Explicit formulas for the sum of two points and for the opposite of a point on an elliptic curve E are given in Section 2.11.2. These operations in can be implemented using a few operations in the ground field .

Computation of mP for and (or, more generally, for ) can be performed using a repeated-double-and-add algorithm similar to the repeated-square-and-multiply Algorithm 3.9. We leave out the trivial modifications and urge the reader to carry out the details.

Finding a random point is another useful problem. If q = p is an odd prime and we use the short Weierstrass Equation (2.8), we first choose a random and substitute X by h to get Y2 = h3 + ah + b. This equation has 2, 0 or 1 solution(s) depending on whether h3 + ah + b is a quadratic residue or non-residue or 0 modulo p. Quadratic residuosity can be checked by computing the Legendre symbol (Algorithm 3.15), whereas square roots modulo p can be computed using Tonelli and Shanks’ Algorithm 3.16.

For a non-supersingular curve E over defined by Equation (2.9), a random point is chosen by first choosing a random . Substituting X = h in the defining equation gives Y2 + hY + (h3 + ah2 + b) = 0. If h = 0, then the unique solution for k is b2n–1. If h ≠ 0, replacing Y by hY and dividing by h2 transforms the equation to the form Y2 + Y + α = 0 for some . This equation has two or zero solutions depending on whether the absolute trace is 0 or 1. If k is a solution, the other solution is k + 1. In order to find a solution (if it exists), one may use the (probabilistic) root-finding algorithm of Exercise 3.36. Another possibility is discussed now.

We consider two separate cases. First, if n is odd, then is a solution, since Tr(α) = k2 + k + α. On the other hand, if n is even, we first find a with Tr(β) = 1. Since Tr is a homomorphism of the additive groups and Tr(1) = 1, exactly half of the elements of have trace 1. Therefore, a desired β can be quickly found out by selecting elements of at random and computing traces of these elements. Now, it is easy to check that gives a solution of Y2 + Y + α = 0.

**3.6.2. Counting Points on Elliptic Curves

Counting points on elliptic curves is a challenging problem, both theoretically and computationally. The first polynomial time (in log q) algorithm invented by Schoof and later made efficient by Elkies and Atkins (and many others), is popularly called the SEA algorithm. Unfortunately, even the most efficient implementation of this algorithm is not quite efficient, but it is the only known reasonable strategy, in particular, when q = p is a large (odd) prime of a size of cryptographic interest. The more recent Satoh–FGH algorithm, named after its discoverer Satoh and after Fouquet, Gaudry and Harley who proposed its generalized and efficient versions, is a remarkable breakthrough for the case q = 2n. Both the SEA and the Satoh–FGH algorithms are mathematically quite sophisticated. We now present a brief overview of these algorithms.

The SEA algorithm

We assume that q = p is a large odd prime, this being the typical situation when we apply the SEA algorithm. We also assume that E is given by the short Weierstrass equation Y2 = X3 + aX + b. Let q1 = 2, q2 = 3, q3 = 5, . . . be the sequence of prime numbers and t the Frobenius trace of E at p. By Hasse’s theorem (Theorem 2.48, p 106), with . A knowledge of t modulo sufficiently many small primes l allows us to reconstruct t using the Chinese remainder theorem. Because of the Hasse bound on t, it is sufficient to choose l from the primes q1, q2, . . . in succession, until the product q1q2 · · · qr exceeds . By the prime number theorem (Theorem 2.20, p 53), we have r = O(ln p) and also qi = O(ln p) for each i = 1, . . . , r.

The most innovative idea of Algorithm 3.31 is the determination of the integers ti. For l = q1 = 2, the process is easy. We have t1t ≡ 0 (mod 2) if and only if contains a point of order 2 (a point of the form (h, 0)), or equivalently, if and only if the polynomial X3 + aX + b has a root in . We compute the polynomial gcd g(X) := gcd(X3 + aX + b, XpX) over and conclude that

Algorithm 3.31. SEA algorithm for elliptic curve point counting

Input: A prime field , p odd, and an elliptic curve E defined over .

Output: The order of the group .

Steps:

Find (the smallest)  such that the product .
for i = 1, 2, . . . , r { Compute  with t ≡ ti (mod qi). }
Compute t by combining t1t2, . . . , tr using the Chinese Remainder Theorem.

Determination of ti for i > 1 involves more work. We explain here the original idea due to Schoof. We denote by l the i-th prime qi and by the set of all l-torsion points of (Definition 2.78, p 105). The Frobenius endomorphism that maps and to (hp, kp) satisfies the relation . If we restrict our attention only to the group E[l], then this relation reduces to , where ti = t rem l and pi = p rem l, that is, for all .

In terms of polynomials, the last relation is equivalent to

Equation 3.4


where the sum and difference follow the formulas for the elliptic curve E. Now, one has to calculate symbolically rather than numerically, since X and Y are indeterminates. These computations can be carried out in the ring (instead of in ), where f(X, Y) = Y2 – (X3 + aX + b) is the defining polynomial of E and fl = fl(X) is the l-th division polynomial of E (Section 2.11.2 and Theorem 2.47, p 106). Reduction of a polynomial in modulo f makes its Y-degree ≤ 1, whereas reduction modulo fl makes the X-degree less than deg fl which is O(l2). We can try the values ti = 0, 1, . . . , l – 1 successively until the desired value satisfying Equation (3.4) is found out.

It is not difficult to verify that Schoof’s algorithm runs in time O(log8 p) (under standard arithmetic in ) and is thus a deterministic polynomial-time algorithm for the point-counting problem. Essentially the same algorithm works for fields with q = 2n and has the same running time. Unfortunately, the big exponent (8) in the running time makes Schoof’s algorithm quite impractical. Numerous improvements are suggested to bring down this exponent. Elkies and Atkin’s modification for the case q = p gives rise to the SEA algorithm which has a running time of O(log6 p) under the standard arithmetic in . This speed-up is achieved by working in the ring , where gl is a suitable factor of fl and has degree O(l). Couveignes suggests improvements for the fields of characteristic 2. Efficient implementations of the SEA algorithm are reported by Morain, Müller, Dewaghe, Vercauteren and many others. At the time of writing this book, the largest values of q for which the algorithm has been successfully applied are 10499 + 153 (a prime) and 21999 (a power of 2).

The Satoh–FGH algorithm

The Satoh–FGH algorithm is well suited for fields of small characteristic p and, in particular, for the fields of characteristic 2. This algorithm has enabled point-counting over fields as large as . A generic description of the Satoh–FGH algorithm now follows after the introduction of some mathematical notions. Though our practical interest concentrates on the fields only, we consider curves over a general with q = pn, p a prime.

Recall from Section 2.14 that the ring of p-adic integers is a discrete valuation ring (Exercises 2.133 and 2.148) with the unique maximal ideal generated by , and the residue field is isomorphic to .

We represent as a polynomial algebra over . We analogously define the p-adic ring , where f is an irreducible polynomial of degree n in . The elements of can be viewed as polynomials of degrees < n and with p-adic integers as coefficients. The arithmetic operations in are polynomial operations in modulo the defining polynomial f. The ring is canonically embedded in the ring (consider constant polynomials).

turns out to be a discrete valuation ring with maximal ideal , and the residue field is isomorphic to .

Definition 3.6.

The projection map is defined as the map that takes a p-adic integer α = (a1, a2, . . .) to , and can be canonically extended to a map by π(α0 + α1X + · · · + αdXd) := π(α0) + π(α1)X + · · · + π(αd)Xd. In particular, this defines a projection map .

The (Teichmüller) lift is the map that takes 0 ↦ 0 and 0 ≠ a ↦ ω(a), where ω(a) is the unique (q – 1)-th root of unity in satisfying π(ω(a)) = a (cf. Exercise 2.160).

The semi-Witt decomposition of is defined to be the unique sequence a0, a1, . . . with such that α has the p-adic expansion .

The p-th power Frobenius endomorphism , aap, can now be extended to an endomorphism as follows. Let have the semi-Witt decomposition a0, a1, . . . with . Then, is the unique element having the semi-Witt decomposition One can show that . We have and similarly .

Now, let E = E0 be an elliptic curve defined over . Application of to the coefficients of E0 gives another elliptic curve E1 over whose rational points are , , where , together with the point at infinity. We may apply to E1 to get another curve E2 over and so on. Since , we get a cycle of elliptic curves defined over :

Equation 3.5


Similarly, if ε = ε0 is an elliptic curve defined over , application of leads to a sequence of elliptic curves defined over :

Equation 3.6


We need the canonical lifting of an elliptic curve E over to a curve ε over . Explaining that requires some more mathematical concepts:

Definition 3.7.

Let K be a field and let E and E′ be two elliptic curves defined over K. A morphism (Definition 2.72, p 95) that maps the point at infinity of E to the point at infinity of E′ is called an isogeny. The zero isogeny EE′ maps every point to . A non-zero isogeny is also called a non-constant isogeny. Two curves E and E′ are called isogenous, if there exists a non-constant isogeny EE′.

The kernel ker of an isogeny is defined to be the set . For every non-constant isogeny , the kernel ker is a finite subgroup of E(K).

The set Hom(E, E′) of all isogenies EE′ is an Abelian group defined as , , , . If E = E′, then End(E) := Hom(E, E) becomes a ring with multiplication defined by composition and is called the endomorphism ring of E.

The multiplication-by-m map of E is an isogeny. If End(E) contains an isogeny not of this type, we call E an elliptic curve with complex multiplication.

Theorem 3.4.

For each , there exists a unique polynomial symmetric and of degree i + 1 in each of X and Y, such that two curves E and E′ (defined over a field K) with j-invariants j and j′ satisfy Φi(j, j′) = 0 if and only if there is an isogeny EE′ whose kernel is cyclic of order i.

Definition 3.8.

The polynomials , , of Theorem 3.4 are called modular polynomials. As an example,

Φ2(X, Y)=X3 + Y3X2Y2 + 1488(X2Y + XY2) –
  162,000(X2 + Y2) + 40,773,375XY + 8,748,000,000(X + Y) –
  157,464,000,000,000.

The next theorem establishes the foundation for lifting curves from to .

Theorem 3.5. Lubin–Serre–Tate

Let E be an elliptic curve defined over , q = pn, , and with j-invariant . There exists an elliptic curve ε defined over with a unique j-invariant such that and . The curve ε is called the canonical lift of E and is unique upto isomorphism.

With this definition of lifting of elliptic curves, Cycles (3.5) and (3.6) satisfy the following commutative diagram, where εi is the canonical lift of Ei for each i = 0, 1, . . . , n.

Algorithm 3.32 outlines the Satoh–FGH algorithm. In order to complete the description of the algorithm, one should specify how to lift curves (that is, a procedural equivalent of Theorem 3.5) and their p-torsion points and how the lifted data can be used to compute the Frobenius trace t. We leave out the details here.

Algorithm 3.32. Satoh–FGH algorithm for elliptic curve point counting

Input: An elliptic curve E over , q = pn, p prime, with j-invariant .

Output: The cardinality or equivalently the trace .

Steps:

Compute the curves E0, . . . , En–1 and their j-invariants j0, . . . , jn–1.
Compute the lifted j-invariants J0, . . . , Jn–1.
Compute the lifted curves ε0, . . . , εn–1.
Lift the p-torsion groups Ei[pfor i = 0, . . . , n – 1.
Compute t and hence  from the lifted data.

The elements of (and hence of ) are infinite sequences and hence cannot be represented in computer memory. However, we make an approximate representation by considering only the first m terms of the sequences representing elements of . Working in with this approximate representation is then essentially the same as working in . For the Satoh–FGH algorithm, we need mn/2.

For small p (for example, p = 2) and with standard arithmetic in , the Satoh–FGH algorithm has a deterministic running time O(n5) and space requirement O(n3). With Karatsuba arithmetic the exponent in the running time drops from 5 to nearly 4.17. In addition, this algorithm is significantly easier to implement than optimized versions of the SEA algorithm. These facts are responsible for a superior performance of the Satoh–FGH algorithm over the SEA algorithm (for small p).

3.6.3. Choosing Good Elliptic Curves

Choosing cryptographically suitable elliptic curves is more difficult than choosing good finite fields. First, the order of the elliptic curve group must have a suitably large prime divisor, say, of bit length 160 or more. In addition, the MOV attack applies to supersingular curves and the anomalous attack to anomalous curves (Definition 2.80 and Section 4.5). So a secure curve must be non-supersingular and non-anomalous. Checking all these criteria for a random curve E over requires the group order . One may use either the SEA algorithm or the Satoh–FGH algorithm to compute . Once is known, it is easy to check whether E is supersingular or anomalous. But factoring to find its largest prime divisor may be a difficult task and is not recommended. One may instead extract all the small prime factors of by trial divisions with the primes q1 = 2, q2 = 3, q3 = 5, . . . , qr for a predetermined r and write where m1 has all prime factors ≤ qr and m2 has all prime factors > qr. If m2 is prime and of the desired size, then E is treated as a good curve. Algorithm 3.33 illustrates these steps.

The computation of the group orders takes up most of the execution time of the above algorithm. It is, therefore, of utmost importance to employ good algorithms for point counting. The best algorithms known till date (the SEA and the Satoh–FGH algorithms) are only reasonable. Further research in this area may lead to better algorithms in future.

Algorithm 3.33. Selecting cryptographically suitable elliptic curves

Input: A suitably large finite field .

Output: A cryptographically good elliptic curve E over .

Steps:

while (1) {
   Generate a random elliptic curve E over .
   Determine .
   if (E is neither supersingular nor anomalous) {
      Try to factorize  using trial division by small primes.
      if ( has a suitably large prime divisor) { Return E }
   }
}

There are ways of generating good curves without requiring the point counting algorithms over large finite fields. One possibility is to use the so-called subfield curves. If has a subfield of relatively small cardinality, one can choose a random curve E over and compute . Since E is also a curve defined over and can be easily obtained using Theorem 2.51 (p 107), we save the lengthy direct computation of . However, the drawback of this method is that since E is now chosen with coefficients from a small field , we do not have many choices. The second drawback is that we must have a small divisor q′ of q. If q is already a prime, this strategy does not work at all. If q = pn, p a small prime, we need n to have a small divisor n′ that corresponds to q′ = pn. Sometimes small odd primes p are suggested, but the arithmetic in a non-prime field of some odd characteristic is inherently much slower than that in a field of nearly equal size but of characteristic 2.

Specific curves with complex multiplication (Definition 3.7) over large prime fields have also been suggested in the literature. Finding good curves with complex multiplication involves less computational overhead than Algorithm 3.33, but (like subfield curves) offers limited choice. However, it is important to mention that no special attacks are currently known for subfield curves and also for those chosen by the complex multiplication strategy.

**3.7. Arithmetic on Hyperelliptic Curves

Let be a finite field and C a hyperelliptic curve of genus g defined over K by Equation (2.13), that is, by

C : Y2 + u(X)Y = v(X)

for suitable polynomials u, . We want to implement the arithmetic in the Jacobian . Recall from Section 2.12 that an element of can be represented uniquely as a reduced divisor Div(a, b) for a pair of polynomials a(x), with a monic, degx ag, degx b < degx a and a|(b2 + buv). Thus, each element of requires O(g log q) storage.

3.7.1. Arithmetic in the Jacobian

We first present Algorithm 3.34 that, given two elements Div(a1, b1), Div(a2, b2) of , computes the reduced divisor Div which satisfies Div(a, b) ~ Div(a1, b1) + Div(a2, b2). The algorithm proceeds in two steps:

  1. Compute a semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2).

  2. Compute the reduced divisor Div(a, b) ~ Div(a′, b′).

Both these steps can be performed in (deterministic) polynomial time (in the input size, that is, g log q). Algorithm 3.34 implements the first step and continues to work even when the input divisors are semi-reduced (and not completely reduced).

Algorithm 3.34. Sum of semi-reduced divisors

Input: (Semi-)reduced divisors Div(a1, b1) and Div(a2, b2) defined over K.

Output: A semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2).

Steps:

d1 := gcd(a1, a2) = u1a1 + u2a2./* Extended gcd in K[X] */
d2 := gcd(d1, b1 + b2 + u) = v1d1 + v2(b1 + b2 + u)./* Extended gcd in K[X] */

It is an easy check that the two expressions appearing between pairs of big parentheses in Algorithm 3.34 are polynomials. This algorithm does only a few gcd calculations and some elementary arithmetic operations on polynomials of K[X]. If the input polynomials (a1, a2, b1, b2) correspond to reduced divisors, then their degrees are ≤ g and hence this algorithm runs in polynomial time in the input size. Furthermore, in that case, the output polynomials a′ and b′ are of degrees ≤ 2g.

We now want to compute the unique reduced divisor Div(a, b) equivalent to the semi-reduced divisor Div(a′, b′). This can be performed using Algorithm 3.35. If the degrees of the input polynomials a′ and b′ are O(g) (as is the case with those output by Algorithm 3.34), Algorithm 3.35 takes a time polynomial in g log q. To sum up, two elements of can be added in polynomial time. The correctness of the two algorithms is not difficult to establish, but the proof is long and involved and hence omitted. Interested readers might look at the appendix of Koblitz’s book [154].

For an element and , one can easily write an algorithm (similar to Algorithm 3.9) to compute nα using O(log n) additions and doublings in .

3.7.2. Counting Points in Jacobians of Hyperelliptic Curves

For a hyperelliptic curve C of genus g defined over a field , we are interested in the order of the Jacobian rather than in the cardinality of the curve . Algorithmic and implementational studies of counting have not received enough research endeavour till date and though polynomial-time algorithms are known to this effect (at least for curves of small genus), these algorithms are far from practical for hyperelliptic curves of cryptographic sizes. In this section, we look at some of these algorithms.

Algorithm 3.35. Reduction of a semi-reduced divisor

Input: A semi-reduced divisor Div(a′, b′) defined over K.

Output: The reduced divisor Div(a, b) ~ Div(a′, b′).

Steps:

(ab) := (a′, b′).
while (deg a > g) {
  .  /* a′ is a polynomial */
  b′ := –(u + b) rem a′.
  (ab) := (a′, b′).
}
a := [lc(a)]–1a.   /* Make a monic */

We start with some theoretical results which are generalizations of those for elliptic curves. The Frobenius endomorphism , xxq, is a (non-trivial) -automorphism of . The map naturally (that is, coordinate-wise) extends to the points on and also to divisors and, in particular, to the Jacobian as well as to . For a reduced divisor Div, we have , where for a polynomial the polynomial is obtained by applying the map to the coefficients of h. It is known that satisfies a monic polynomial χ(X) of degree 2g with integer coefficients. For example, for g = 1 (elliptic curves) we have

χ(X) = X2tX + q,

where t is the trace of Frobenius at q. For g = 2, we have

Equation 3.7


for integers t1, t2. The cardinality is related to the polynomial χ(X) as

and satisfies the inequalities

Equation 3.8


Thus n lies in a rather narrow interval, called the Hasse–Weil interval, of width ,

Theorem 2.50 can be generalized as follows:

Theorem 3.6. Structure theorem for

The Jacobian is the direct sum of at most 2g cyclic groups, that is, with r ≤ 2g, n1, . . . , nr ≥ 2 and ni + 1|ni for each i = 1, 2, . . . , r – 1.

The exponent of (See Exercise 3.42) is clearly m := Exp . Since m|n, there are ≤ ⌈(w + 1)/m⌉ possibilities for n for a given m (where w is the width of the Hasse–Weil interval). In particular, n is uniquely determined by m, if m > w. From the Hasse–Weil bound, we have , that is, . There are examples with . On the other hand, . So it is possible to have mw, though such curves are relatively rare. In the more frequent case (m > w), Algorithm 3.36 determines n.

Algorithm 3.36. Hyperelliptic curve point counting

Input: A hyperelliptic curve C of genus g defined over .

Output: The cardinality n of the Jacobian .

Steps:

m := 1.
while (m ≤ w) {
   Choose a random element .
   Determine ν := ord x.
   m := lcm(m, ν).
}
n := the unique multiple of m in the Hasse–Weil interval.

Since Exp , the above algorithm eventually (in practice, after few executions of the while loop) computes this exponent. However, if Exp , the algorithm never terminates. Thus, we may forcibly terminate the algorithm by reporting failure, after sufficiently many random elements x are tried (and we continue to have mw). In order to complete the description of the algorithm, we must specify a strategy to compute ν := ord x for a randomly chosen . Instead of computing ν directly, we compute an (integral) multiple μ of ν, factorize μ and then determine ν. Since nx = 0, we search for a desired multiple μ in the Hasse–Weil interval. This search can be carried out using a baby-step–giant-step (Section 4.4) or a birthday-paradox (Exercise 2.172) method, and the algorithm achieves an expected running-time of which is exponential in the input size. This method, therefore, cannot be used except when n is small.

For hyperelliptic curves of small genus g, generalizations of Schoof’s algorithm (Algorithm 3.31) can be used. Gaudry and Harley [106] describe the case g = 2. One computes the polynomial χ(X) of Equation (3.7), that is, the values of t1 and t2 modulo sufficiently many small primes l. Since the roots of χ(X) are of absolute value , we have and |t2| ≤ 6q. Therefore, determination of t1 and t2 modulo O(log q) small primes l uniquely determines χ(X) (as well as n = χ(1)).

Let be the set of l-torsion points of . The Frobenius map restricted to satisfies

Equation 3.9


where t1, l := t1 rem l, t2, l := t2 rem l and ql := q rem l. By exhaustively trying all (that is, ≤ l2) possibilities for t1,l and t2,l, one can find out their actual values, that is, those values that cause the left side of Equation (3.9) to vanish (symbolically).

A result by Kampkötter [144] allows us to consider only the reduced divisors of the form D = Div(a, b) with a(X) = X2 + a1X + a0 and b(X) = b1X + b0. There exists an ideal of the polynomial ring such that a reduced divisor D of this special form lies in if and only if f(a1, a0, b1, b0) = 0 for all . Thus the computation of the left side of Equation (3.9) may be carried out in the ring . An explicit set of generators for can be found in Kampkötter [144]. To sum up, we get a polynomial-time algorithm.

Working (modulo ) in the 4-variate polynomial ring is, indeed, expensive. Use of Cantor’s division polynomials [43] essentially reduces the arithmetic to proceed with a single variable (instead of four). We do not explore further along this line, but only mention that for g = 2 Schoof’s algorithm employing division polynomials runs in time O(log9 q). Although this is a theoretical breakthrough, the prohibitively large exponent (9) in the running-time precludes the feasibility of using the algorithm in the range of interest in cryptography.

Exercise Set 3.7

3.42Let G be a multiplicative group (not necessarily Abelian and/or finite) with identity e.

Let .

  1. Show that S is a subgroup of .

  2. Show that every subgroup of is generated by a single element. In particular, S = 〈m〉 for some integer m. Without loss of generality, we can take m ≥ 0. This m is called the exponent of the group G and is denoted by Exp G.

  3. If G is finite, show that Exp G| ord G.

  4. If G is finite and Abelian, show that Exp . Deduce that in this case there exists such that ord x = Exp G.

3.8. Random Numbers

So far we have met several situations where we needed random elements from a (finite) set, for example, the set (or ) or the set (or ) or the set of -rational points on an elliptic (or hyperelliptic) curve. By randomness, we here mean that each element is equally likely to get selected, that is, if #S = n, then each element of S is selected with probability 1/n. Since elements of a set S of cardinality n can be represented as bit strings of length ≤ ⌈lg(n + 1)⌉, the problem of selecting a random element of S essentially reduces to the problem of generating (finite) random sequences of bits. A random sequence of bits is one in which every bit has a probability of 1/2 of being either 0 or 1 (irrespective of the other bits in the sequence).

3.8.1. Pseudorandom Bit Generators

Generating a (truly) random sequence of bits seems to be an impossible task. Some natural phenomena, such as electronic noise from a specifically designed integrated circuit, can be used to generate random bit sequences. However, such systems are prone to malfunctioning, often influenced by observations and are, of course, costly. A software solution is definitely the more practical alternative. Phenomena, like the system clock, the work load or memory usage of a machine, that can be captured by programs may be used to generate random bit sequences. But this strategy also suffers from various drawbacks. First of all the sequences generated by these methods would not be (truly) random. Moreover they are vulnerable to attacks by adversaries (for example, if a random bit generator is based on the system clock and if the adversary knows the approximate time when a bit sequence is generated using that generator, she will have to try only a few possibilities to generate the same sequence).

In order to obviate these difficulties, pseudorandom bit generators (PRBG) are commonly used. A bit string a0a1a2 . . . is generated by a PRBG following a specific strategy, which is more often that not a (mathematical) algorithm. The first bit a0 is based on certain initial value, called a seed, whereas for i ≥ 1the bit ai is generated as a predetermined function of some or all of the previous bits a0, . . . , ai–1. Since the resulting bit ai is now functionally dependent on the previous bits, the sequence is not at all random (but deterministic), but we are happy if the sequence a0a1a2 . . . looks or behaves random. A random behaviour of a sequence is often examined by certain well-known statistical tests. If a generator generates bit sequences passed by these tests, we call it a PRBG and sequences available from such a generator pseudorandom bit sequences. Various kinds of PRBGs are used for generating pseudorandom bit sequences. We won’t describe them here, but concentrate on a particular kind of generators that has a special significance in cryptography.

3.8.2. Cryptographically Strong Pseudorandom Bit Generators

A PRBG for which no polynomial time algorithms exist (provably or not) in order to predict with probability significantly larger than 1/2 a bit in a sequence generated by the PRBG from a knowledge of the previous bits (but without the knowledge of the seed) is called a cryptographically strong (or secure) pseudorandom bit generator or a CSPRBG in short. Usually, an intractable computational problem (see Section 4.2) is at the heart of the security of a CSPRBG. As an example, we now explain the Blum–Blum–Shub (or BBS) generator.

Algorithm 3.37. Blum–Blum–Shub pseudorandom bit generator

Input: .

Output: A cryptographically strong pseudorandom bit sequence a0a1a2 . . . .

Steps:

Generate two (distinct) large primes p and q each ≡ 3 (mod 4).
n := pq.
Generate a (random) seed .
x0 := s2 (mod n).
for i = 0, . . . , m {
   ai := the least significant bit of xi.
   .
}

In Algorithm 3.37, we have used indices for the sequence xi for the sake of clarity. In an actual implementation, all indices may be removed, that is, one may use a single variable x to store and update the sequence xi. Furthermore, if there is no harm in altering the value of s, one might even use the same variable for s and x.

The cryptographic security of the BBS generator stems from the presumed intractability of factoring integers or of computing square roots modulo a composite integer (here n = pq) (see Exercise 3.43). Note that p, q and s have to be kept secret, whereas n can be made public. A knowledge of xm + 1 is also not expected to help an opponent and may too be made public. For achieving the desired level of secrecy, p and q should be of nearly equal size and the size of n should be sufficiently large (say, 768 bits or more). Generating each bit by the BBS generator involves a modular squaring and is, therefore, somewhat slow (compared to the traditional PRBGs which do not guarantee cryptographic security). However, the BBS generator can be used for moderately infrequent purposes, for example, for the generation of a session key. Moreover, a maximum of lg lg n (least significant) bits (instead of 1 as in the above snippet) can be extracted from each xi without degrading the security of the generator.

It is evident that any (infinite) sequence a0a1 · · · generated by the BBS generator must be periodic. As an extreme example, if s = 1, then the BBS generator outputs a sequence of one-bits only. We are interested in rather short (sub)sequences (of such infinite sequences). Therefore, it suffices if the length of the period is reasonably large (for a random seed s). This is guaranteed if one uses strong primes (Definition 3.5)

3.8.3. Seeding Pseudorandom Bit Generators

The way we have defined PRBG (or CSPRBG) makes it evident that the unpredictability of a pseudorandom bit sequence essentially reduces to that of the seed. Care should, therefore, be taken in order to choose the values of the seed. The seed need not be randomly or pseudorandomly generated, but should have a high degree of unpredictability, so that it is infeasible for an adversary to have a reasonably quick guess of it. As an example, assume that we intend to generate a suitable seed s for the BBS generator with a 1024-bit modulus n. If we employ for that purpose a specific algorithm (known to the opponent) using only the built-in random number generator of a standard compiler and if this built-in generator has a 32-bit seed σ, then there are only 232 possibilities for s, even when s itself is 1024 bits long. Thus an adversary has to try at most 232 (231 on an average) values of σ in order to guess the correct value of s. So we must add further unpredictability to the resulting seed value s. This can be done by setting the bits of s depending on several factors, like the system clock, the system load, the memory usage, keyboard inputs from a human user and so on. Each of such factors might not be individually completely unpredictable, but their combined effect should preclude the feasibility of an exhaustive search by the opponent. After all, we have 1024 bits of s to fill up and even if the total search space of possible values of s is as low as 2160, it would be impossible for the opponent to guess s in a reasonable span of time. Note that more often than not the values of the seed need not be remembered: that is, need not be regenerated afterwards. As a result, there is no harm in introducing unpredictability in s caused by certain factors that we would not ourselves be able to reproduce in future.

Exercise Set 3.8

3.43With the notations of Algorithm 3.37 show that:
  1. Every quadratic residue has four distinct square roots modulo n, of which exactly one, say y, is a quadratic residue modulo n. [H]

  2. The square root y of x can be obtained by solving the simultaneous congruences yx(p + 1)/4 (mod p) and yx(q + 1)/4 (mod q).

  3. The bit sequence a0a1 . . . am is uniquely determined by (n and) xm + 1.

  4. One can compute in polynomial (in log n and m) time the bit sequence a0a1 . . . am from the knowledge of n and xm + 1, if either

    1. the primes p and q are known, or

    2. one can check in polynomial (in log n) time if an arbitrary element is a quadratic residue modulo n and if so, compute in polynomial time the square roots of y modulo n.

Chapter Summary

This chapter deals with the algorithmic details needed for setting up public-key cryptosystems. We study algorithms for selecting public-key parameters and for carrying out the basic cryptographic primitives. Algorithms required for cryptanalysis are dealt with in Chapters 4 and 7.

We start the chapter with a discussion on algorithms. Time and space complexities of algorithms are discussed first and the standard order notations are explained. Next we study the class of randomized algorithms which provide practical solutions to many computational problems that do not have known efficient deterministic algorithms. In the worst case, a randomized algorithm may take exponential running time and/or may output an incorrect answer. However, the probability of these bad behaviours of a randomized algorithm can be made arbitrarily low. We finally discuss reduction between computational problems. A reduction helps us conclude about the complexity of one problem relative to that for another problem.

Many popular public-key cryptosystems are based on working modulo big integers. These integers have sizes up to several thousand bits. One can not represent such integers with full precision by built-in data types supplied by common programming languages. So we require efficient ways of representing and doing arithmetic on big integers. We carefully deal with the implementation of the arithmetic on multiple-precision integers. We provide a special treatment of computation of gcd’s and extended gcd’s of integers. We utilize these arithmetic functions in order to implement modular arithmetic. Most public-key primitives involve modular exponentiations as the most time-consuming steps. In addition to the standard square-and-multiply algorithm, certain special tricks (including Montgomery exponentiation) that help speed up modular exponentiation are described at length in this section.

In the next section, we deal with some other number-theoretic algorithms. One important topic is the determination of whether a given integer is prime. The Miller–Rabin primality test is an efficient algorithm for primality testing. This algorithm is, however, randomized in the sense that it may declare some composite integers as primes. Using suitable choices of the relevant parameters, the probability of this error may be reduced to very low values (≤ 2–80). We also briefly introduce the deterministic polynomial-time AKS algorithm for primality testing. Since we can easily check for the primality of integers, we can generate random primes by essentially searching in a pool of randomly generated odd integers of a given size. Security in some cryptosystems require such random primes to possess some special properties. We present Gordon’s algorithm for generating cryptographically strong primes. The section ends with a study of the Tonelli–Shanks algorithm for computing square roots modulo a big prime.

Next, we concentrate on the implementation of the finite field arithmetic. The arithmetic of a field of prime cardinality p is the same as integer arithmetic modulo p and is discussed in detail earlier. The other finite fields that are of interest to cryptology are extension fields of characteristic 2. In order to study the arithmetic in these fields, one first requires arithmetic of the polynomial ring . We discuss the basic operations in this ring. Next we talk about algorithms for checking irreducibility of polynomials and for obtaining (random) irreducible polynomials in . If f(X) is such a polynomial of degree d, the arithmetic of the field is the same as the arithmetic of modulo the defining polynomial f(X). In order that a finite field is cryptographically safe, we require q – 1 to have a prime factor of sufficiently big size (160 bits or more). Suppose that the factorization of q – 1 is provided. We discuss algorithms that compute the order of elements in , that check if a given element is a generator of the cyclic group , and that produce random generators of . We end the study of finite fields by discussing a way to factor polynomials over finite fields. The standard algorithm comprising the three steps square-free factorization, distinct-degree factorization and equal-degree factorization is explained in detail. The exercises cover the details of an algorithm to compute the roots of polynomials over finite fields.

The arithmetic of elliptic curves over finite fields is dealt with next. Each operation in the elliptic curve group can be realized by a sequence of operations over the underlying field. The multiple of a point on an elliptic curve can be computed by a repeated double-and-add algorithm which is the same as the square-and-multiply algorithm for modular exponentiation, applied to an additive setting. We also discuss ways of selecting random points on elliptic curves. We then present two algorithms for counting points in an elliptic curve group. The SEA algorithm is suitable for curves over prime fields, whereas the Satoh–FGH algorithm works efficiently for curves over fields of characteristic 2. Once we can determine the order of an elliptic curve group, we can choose good elliptic curves for cryptographic usage.

In the next section, we study the arithmetic of hyperelliptic curves. We describe ways to represent elements of the Jacobian by pairs of polynomials and to do arithmetic on elements in this representation. We also discuss two algorithms for counting points in a Jacobian.

In the last section, we address the issue of generation of pseudorandom bits. We define the concept of cryptographically strong pseudorandom bit generator and provide an example, namely the Blum–Blum–Shub generator, which is cryptographically strong under the assumption that taking square roots modulo a big composite integer is computationally intractable.

Suggestions for Further Reading

The basic algorithmic issues discussed in Section 3.2 can be found in any text-book on data structure and algorithms. One can, for example, look at [7, 8, 61]. However, most of these elementary books do not talk about randomization and parallelization issues. We refer to [214] for a recent treatise on randomized algorithm. Also see Rabin’s papers [247, 248].

Complexity theory deals with classifying computational problems based on the known algorithms for solving them and on reduction of one problem to another. A simple introduction to complexity theory is the book [280] by Sipser. Chapter 2 of Koblitz’s book [154] is also a compact introduction to computational complexity meant for cryptographers. Also see [113].

Knuth’s book [147] is seemingly the best resource to look at for a comprehensive treatment on multiple-precision integer arithmetic. The proofs of correctness of many algorithms, that we omitted in Section 3.3, can be obtained in this book. This can be supplemented by the more advanced algorithms and important practical tips compiled in the book [56] by Cohen who designed a versatile computational number theory package known as PARI. Montgomery’s multiplication algorithm appeared in [210]. Also see Chapter 14 of Menezes et al. [194] for more algorithms and implementation issues.

Most of the important papers on primality testing [3, 4, 5, 116, 175, 204, 248, 287] have been referred in Section 3.4.1. Also see the survey [164] due to Lenstra and Lenstra. Gordon’s algorithm for generating strong primes appeared in [118]. The book [69] by Crandall and Pomerance is an interesting treatise on prime numbers, written with a computational perspective. The modular square-root Algorithm 3.16 is essentially due to Tonelli (1891). Algebraic number theory is treated from a computation perspective in Cohen [56] and Pohst and Zassenhaus [235].

Arithmetic on finite fields is discussed in many books including [179, 191]. Finite fields find recent applications in cryptography and coding theory and as such it is necessary to have efficient software and hardware implementations of finite field arithmetic. A huge number of papers have appeared in the last two decades, that talk about these implementation issues. Chapter 5 of Menezes [191] talks about optimal normal bases (Section 2.9.3 of the current book) which speeds up exponentiation in finite fields.

Factoring univariate polynomials over finite fields is a topic that has attracted a lot of research attention. Berlekamp’s Q-matrix method [21] is the first modern algorithm for this purpose. Computationally efficient versions of the algorithm discussed in Section 3.5.4 have been presented in Gathen and Shoup [104] and Kaltofen and Shoup [143]. The best-known running time for a deterministic algorithm for univariate factorization over finite fields is due to Shoup [272]. Shparlinski shows [274] that Shoup’s algorithm on a polynomial in of degree d uses O(q1/2(log q)d2+ε) bit operations. This is fully exponential in log q.

The book [103] by von zur Gathen and Gerhard is a detailed treatise on many topics discussed in Sections 3.2 to 3.5 of the current book. Mignotte’s book [203] and the one by [108] by Geddes et al. also have interesting coverage. Also see Chapter 1 of Das [72] for a survey of algorithms for various computational problems on finite fields.

For elliptic curve arithmetic, look at Blake et al. [24], Hankerson et al. [123] and Menezes [192]. The first polynomial-time algorithm for counting points in elliptic curves over a finite field has been proposed by Schoof. The original version of this algorithm runs in time O(log8 q). Later Elkies improved this running time to O(log6 q) for most of the elliptic curves. Further modifications due to Atkin gave rise to what we call the SEA algorithm. Schoof’s paper [264] talks about this point-counting algorithm and includes the modifications due to Elkies and Atkin. Also look at the article [85] by Elkies.

The Satoh–FGH algorithm is originally due to Satoh [256]. Fouquet et al. [94] have proposed a modification of Satoh’s algorithm to work for fields of characteristic 2. They also report large-scale implementations of the modified algorithm. Also see Fouquet et al. [95] and Skjernaa [281].

Recently, there has been lot of progress in point counting algorithms, in particular, for fields of characteristic 2. The most recent account of this can be found in Lercier and Lubicz [177]. The authors of this paper later reported implementation of their algorithm for counting points in an elliptic curve over . This computation took nearly 82 hours on a 731 MHz Alpha EV6 processor. With these new developments, the point counting problem is practically solved for fields of small characteristics. However, for prime fields the known algorithms require further enhancements in order to be useful on a wide scale.

Finding good random elliptic curves for cryptographic purposes has also been an area of active research recently. With the current status of solving the elliptic curve discrete-log problem, the strategy we mentioned in Algorithm 3.33 is quite acceptable as long as good point-counting algorithms are at our disposal (they are now). For further discussions on this topic, we refer the reader to two papers [95, 176].

The appendix in Koblitz’s book [154] is seemingly the best source for learning hyperelliptic curve arithmetic. This is also available as a CACR technical report [195]. Gaudry and Harley’s paper [106] has more on the hyperelliptic curve point-counting algorithms we discussed in Section 3.7.2. Hess et al. [126] discuss methods for computing hyperelliptic curves for cryptographic usage.

Chapter 5 of Menezes et al. [194] is devoted to the generation of pseudorandom bits and sequences. This chapter lists the statistical tests for checking the randomness of a bit sequence. It also describes two cryptographically secure pseudorandom bit generators other than the BBS generator (Algorithm 3.37). The BBS generator was originally proposed by Blum et al. [26]. Also see Chapter 3 of Knuth [147].

4. The Intractable Mathematical Problems

4.1Introduction
4.2The Problems at a Glance
4.3The Integer Factorization Problem
4.4The Finite Field Discrete Logarithm Problem
4.5The Elliptic Curve Discrete Logarithm Problem
4.6The Hyperelliptic Curve Discrete Logarithm Problem
4.7Solving Large Sparse Linear Systems over Finite Rings
4.8The Subset Sum Problem
 Chapter Summary
 Sugestions for Further Reading

It is insufficient to protect ourselves with laws; we need to protect ourselves with mathematics.

—Bruce Schneier

Most number theorists considered the small group of colleagues that occupied themselves with these problems as being inflicted with an incurable but harmless obsession.

—Arjen K. Lenstra and Hendrik W. Lenstra, Jr. [164]

All mathematics is divided into three parts: cryptography (paid for by CIA, KGB and the like), hydrodynamics (supported by manufacturers of atomic submarines) and celestial mechanics (financed by military and other institutions dealing with missiles, such as NASA).

—V. I. Arnold [13]

4.1. Introduction

Public-key cryptographic systems are based on the apparent intractability of solving certain computational problems. However, there is very little evidence (if any) to corroborate the fact that algorithmic solutions to these problems are really very difficult. In spite of intensive studies over a long period, mathematicians and cryptologists have not come up with good algorithms, and it is their failures that justify the attempts to go on building secure cryptographic protocols based on these problems. The inherent assumption is that it would be infeasible for an opponent having practical amounts of computing resources to break these cryptosystems in a reasonable amount of time. Of course, the fear remains that someone may devise a fast algorithm and our cryptosystems may not pass the security guarantees. On the other extreme, it is also possible that someone proves the theoretical (and, hence, practical) impossibility of solving such a problem in a small (like polynomial) amount of time, and our cryptosystems become secure for ever (well, at least until other paradigms of computing, like the yet practically non-implementable quantum computing, solve the problems efficiently).

Whether you are a cryptographer or a cryptanalyst, it is important, if not essential, to be aware of the best methods available till date to attack the intractable problems of cryptography. In the first place, this knowledge quantifies practical security margins of the protocols, for instance, by dictating the determination of the input sizes as a function of the security requirements. Let us take a specific example: With today’s computing power and known integer factorization algorithms, we assert that a message that needs to be kept secret for a day or two may be encrypted by a 768-bit RSA key, whereas if one wants to maintain the security for a year or more, much longer keys are needed. The second point in studying the known cryptanalytic algorithms is that though general-purpose algorithms for solving these problems are still unknown, there are good algorithms for specific cases—the cases to be avoided by the designers of cryptographic applications. For example, there is a linear-time algorithm to attack cryptographic systems based on anomalous elliptic curves. The moral is that one must not employ these curves for cryptographic applications. The third reason for studying cryptanalytic algorithms is sentimental. The fact that we are still unable to answer some simply stated questions even after spending a reasonable amount of collective effort is indeed humbling. To worsen matters, cryptography thrives by exploiting this scientific inadequacy. Cryptanalysis, though seemingly unlawful from a cryptographer’s viewpoint, turns out to be a deep and beautiful area of applied mathematics. Ironically enough, it is quite common that the proponents of cryptographic protocols are themselves most interested to see the end. The journey goes on. . . Read on!

It may appear somewhat unusual to discuss the cryptanalytic algorithms prior to the cryptographic ones (see Chapter 5). We find this order convenient in that one must first know the intractable problems before applying them in cryptographic protocols. Moreover, the known attacks help one fix the parameters for use in the cryptographic algorithms. We defer till Chapter 7 other cryptanalytic techniques which do not directly involve solving these mathematical problems. The full power of the mathematical machinery of Chapters 2 and 3 is felt here in the science of cryptology. Understanding the various aspects of cryptology hence becomes easier.

4.2. The Problems at a Glance

Let us first introduce the intractable problems of cryptology. In the rest of this chapter, we describe some known methods to solve these problems.

The integer factorization problem (IFP) is perhaps the most studied one in the lot. We know that is a unique factorization domain (UFD) (Definition 2.25, p 40), that is, given a natural number n there are (pairwise distinct) primes (unique up to rearrangement) such that for some . Broadly speaking, the IFP is the determination of these pi and αi from the knowledge of n. Note that once the prime divisors pi of n are known, it is rather easy to compute the multiplicities αi = vpi(n) by trial divisions. It is, therefore, sufficient to find out the primes pi only. It is easy (Algorithm 3.13) to check if n is composite. If n is already prime, then its prime factorization is known. On the other hand, if n is known to be composite, an algorithm that splits n into two non-trivial factors, that is, that outputs n1, with n = n1n2, n1 < n and n2 < n, can be repeatedly used to compute the complete factorization of n. It is enough that a non-trivial factor n1 of n is made available, the cofactor n2 = n/n1 is obtained by a single division. Finally, it is sometimes known a priori that n is the product of two (distinct odd) primes (as in the RSA protocols). In this case, the non-trivial split of n immediately gives the desired factorization of n. To sum up, the IFP can be stated in various versions, the presumed difficulty of all these versions being essentially the same.

Problem 4.1

General integer factorization problem Given an integer , determine all the prime divisors of n.

Problem 4.2

Integer factorization problem (IFP) Given a composite integer , find a non-trivial divisor n1 of n (that is, a divisor n1 of n in the range 1 < n1 < n).

Problem 4.3

RSA integer factorization problem Given a product n = pq of two (distinct odd) primes p and q, find the prime divisors p and q of n.

Recall that if is the prime factorization of n, then the Euler totient function φ(n) of n is . Thus, if the prime factorization of n is known, it is easy to compute φ(n). The converse is not known to be true in general. However, if n = pq is the product of two primes, factoring n is polynomial-time equivalent to computing φ(n) (Exercise 3.6).
Problem 4.4

Totient problem Given a natural number , compute φ(n).

Problem 4.5

RSA totient problem Given a product n = pq of two (distinct odd) primes p and q, compute φ(n).

Note that is also a UFD. Quite interestingly, it is computationally easy to find a non-trivial factor g of a polynomial (that is, 0 < deg g < deg f). One might, for example, use the polynomial-time deterministic L3 algorithm named after Lenstra, Lenstra and Lovasz (Section 4.8.2).

Square roots modulo an integer can be computed in probabilistic polynomial time, if n is a prime (Algorithm 3.16). If n is composite, the situation is different. If the factorization of n is known, then the square roots can be computed modulo each prime divisor of n, lifted modulo the appropriate powers of the prime divisors and subsequently combined using the Chinese remainder theorem. On the other hand, if the factorization of n is not known, then computing square roots modulo n turns out to be a very difficult task. Recall that the Blum–Blum–Shub algorithm (Algorithm 3.37) exploits this fact to design a cryptographically secure random number generator.

Problem 4.6

Modular square root problem (SQRTP) Given a composite integer and an integer a, compute an integer x, if one exists, such that x2a (mod n).

Let us now look at another class of problems of an apparently distinct flavour. Let G be a finite cyclic group of order n := #G and let g be a generator of G. For a moment, let us assume that G is multiplicatively written. Any element can be written as a = gx for some integer x unique modulo n. In this case, x is called the discrete logarithm or the index of a with respect to the base g and is denoted by indg a.
Problem 4.7

Discrete logarithm problem (DLP) Given a finite cyclic group G, a generator g of G and an element , compute indg a.

If we now remove the restrictions that G is cyclic and/or that g is a generator of G (if G is cyclic), then we arrive at a generalized version of the DLP. Let us continue to assume that G is Abelian and finite. The subgroup H of G generated by is anyway cyclic. If , then the discrete logarithm or index of a with respect to the base g is an integer x unique modulo m := ord H such that a = gx. In this case, we denote such an integer x by indg a. On the other hand, if aH, then we say that the discrete logarithm indg a is not defined. Recall from Proposition 2.5 that if G is cyclic and if m is known, then checking if a belongs to H amounts to computing an exponentiation in G (that is, if and only if am is the identity of G). If G is not cyclic (or if m is not known), then it is not easy, in general, to develop such a nice criterion.
Problem 4.8

Generalized discrete logarithm problem (GDLP) Given a finite Abelian group G and elements g, , determine if a belongs to the subgroup of G generated by g, and if so, compute indg a.

Note that the DLP (or the GDLP) need not be an inherently difficult problem. Its difficulty depends on the choice of the group G and also on the representation of elements of G. For example, if G is the additive (cyclic) group and g is an integer with gcd(g, n) = 1, then for every integer a we have indg ag–1a (mod n), where the modular inverse g–1 (mod n) can be computed efficiently using the extended gcd algorithm (Algorithm 3.8) on g and n. Also note that if G is cyclic and if each element of G is represented as indg a for a given generator g of G (see, for example, Section 2.9.3), then computing discrete logarithms in G to the base g is a trivial problem. In that case, it is also trivial to compute discrete logarithms (if existent) to any other base h (Exercise 4.3).

On the other hand, there are certain groups G in which discrete logarithms cannot be computed so easily; that is, computing indices in G may demand time not bounded by any polynomial in log n, where n = ord G. However, if the group operation on any two elements of G can be performed in time bounded by a polynomial in log n, then cryptographic protocols can be based on G. Typical candidates for such groups are listed below together with the conventional names for the DLP over such groups.

Table 4.1. The discrete logarithm problem in various groups
GroupName for the DLP
The (cyclic) multiplicative group of a finite field The finite field discrete logarithm problem or simply the DLP by an abuse of notation.
The (not necessarily cyclic) additive group of points of an elliptic curve defined over a finite field The Elliptic curve discrete logarithm problem or the ECDLP
The Jacobian of a hyperelliptic curve C defined over a finite field The Hyperelliptic curve discrete logarithm problem or the HECDLP

Note that if we are interested in computing indices to a base , we may indeed replace, at least theoretically, G by the subgroup H of G generated by g and may assume, without loss of generality, that G is cyclic. Now, if we know an isomorphism , computing discrete logarithms in G is rather easy (Exercise 4.4). However, computing such an isomorphism is, in general, not an easy task and may demand exponential time and/or storage requirements.

Another problem that is widely believed to be computationally equivalent to the DLP (at least for the groups mentioned in the above table) is called the Diffie–Hellman problem (DHP). Similar to the DLP, the DHP is presumably difficult to solve for the groups , and and one may introduce the specific names DHP, ECDHP and HECDHP to designate this problem applied to these specific groups.

Problem 4.9

Diffie–Hellman problem (DHP) Let G be a multiplicative group and let . Given gx and gy for some (unknown) integers x and y, compute gxy.

Clearly, if a solution of the DLP is given, one may compute y = indg(gy) and, subsequently, gxy = (gx)y. That is, the DHP is no harder than the DLP. A proof for the validity or otherwise of the converse relation between these two problems is not known. It is also widely believed that the DLP is computationally equivalent to the IFP. A complete proof of this equivalence is not known, though certain partial results are available in the literature.

There are some other difficult problems on which cryptographic systems can be built. Problem 4.10 deserves specific mention in this regard.

Problem 4.10

Subset sum problem (SSP) Given a set A := {a1, . . . , an} of natural numbers and , find out if there exist , such that , that is, if there is a subset B of A with the property that . The integers a1, . . . , an are called the weights for the SSP.

The Knapsack problem is a related combinatorial optimization problem. In view of this, the set {a1, . . . , an} is often called a knapsack set, and the SSP is, by an abuse of notation, also referred to as the knapsack problem.

Some of the early cryptographic systems based on the SSP have succumbed to efficient (even polynomial-time) cryptanalytic attacks. However, some schemes have been proposed in the recent years, which seem to be resistant to such attacks, or, in other words, for which good attacks are not yet known. As a result, it is important to study the SSP in some detail.

The SSP is often mapped to problems on lattices. Let v1, . . . , vn be linearly independent vectors in . Consider the set of integer linear combinations of these vectors:

L is called the lattice generated by v1, . . . , vn.

Problem 4.11

Shortest vector problem (SVP) Find a non-zero vector whose length ‖v‖ is smallest in L.

Problem 4.12

Closest vector problem (CVP) Given a vector , find a vector such that the length ‖vw‖ is smallest over all choices of .

For some other difficult computational problems and their applications to cryptography, we refer the reader to the references suggested at the end of this chapter and of Chapter 5.

Exercise Set 4.2

4.1
  1. Let n ≥ 2 be a square-free integer (that is, a product of pairwise distinct primes) and let . Show that the exponentiation map , xxa, is bijective if and only if gcd(a, φ(n)) = 1. [H]

  2. Show that if is not square-free, then for no integer a ≥ 2 the exponentiation map , is bijective. [H]

4.2Show that the following problems are polynomial-time reducible to the IFP.
  1. RSA key inversion problem (RSAKIP) Let n = pq be a product of two (distinct odd) primes p and q. Given with gcd(e, φ(n)) = 1, compute an integer such that ed ≡ 1 (mod φ(n)).

  2. RSA problem (RSAP) Let n and e be as in Part (a). Given , compute such that cxe (mod n). (By Exercise 4.1, such an x exists and is unique.)

  3. Quadratic residuosity problem (QRP) Given an odd integer n > 1 and an integer a with gcd(a, n) = 1, check if a is a quadratic residue modulo n. (Note that if n is a prime, then this problem reduces to the computation of the Legendre symbol . If, on the other hand, n is composite and , one cannot conclude that a is a quadratic residue modulo n.)

4.3Let G be a finite cyclic group of order n and let g, g′ be two arbitrary generators of G.
  1. Show that indg g′ is invertible modulo n and that for every we have indg a ≡ (indg a)(indg g′)–1 (mod n).

  2. Let , m := ord(h) and y := indg h. Show that m = n/gcd(y, n), that y/ gcd(y, n) is invertible modulo m and that for an arbitrary element the index indh a exists if and only if gcd(y, n)| indg a and in that case we have

    indh a ≡ (indg a/ gcd(y, n))(y/ gcd(y, n))–1 (mod m).

4.4Let G be a finite cyclic multiplicatively written group of order n. An algorithm on G is said to be polynomial-time if it runs in time bounded above by a polynomial function of log n. Assume that the product of any two elements in G can be computed in polynomial time. Recall from Exercise 2.47 that . Show that the computation of an isomorphism is polynomial-time equivalent to computing discrete logarithms in G. (That is, assuming that we are given a (two-way) black box that returns in polynomial time or for every and , discrete logarithms in G can be computed in polynomial time. Conversely, if discrete logarithms with respect to a primitive element can be computed in polynomial time, then such a black box can be realized.)
4.5Let p be an (odd) prime and let g be a primitive root modulo p. Show that is a quadratic residue modulo p if and only if the index indg a is even. Hence, conclude that there is a polynomial-time (in log p) algorithm that computes the least significant bit of indg a, given any . More generally, let p – 1 = 2r s, where r, and s is odd. Show that there exists a polynomial-time algorithm that computes the r least significant bits of indg a given any . (This exercise shows that the DLP has a polynomial-time solution for Fermat primes Fn := 22n + 1. Note that Fn is prime for n = 0, 1, 2, 3, 4. No other Fermat primes are known.)

4.3. The Integer Factorization Problem

The integer factorization problem (IFP) (Problems 4.1, 4.2 and 4.3) is one of the most easily stated and yet hopelessly difficult computational problem that has attracted researchers’ attention for ages and most notably in the age of electronic computers. A huge number of algorithms varying widely in the basic strategy, mathematical sophistication and implementation intricacy have been suggested, and, in spite of these, factoring a general integer having only 1000 bits seems to be an impossible task today even using the fastest computers on earth.

It is important to note here that even proving rigorous bounds on the running times of the integer-factoring algorithms is quite often a very difficult task. In many cases, we have to be satisfied with clever heuristic bounds based on one or more reasonable but unprovable assumptions.

This section highlights human achievements in the battle against the IFP. Before going into the details of this account we want to mention some relevant points. Throughout this section we assume that we want to factor a (positive) integer n. Since such an integer can be represented by ⌈lg(n + 1)⌉ bits, the input size is taken to be lg n (or, ln n, or log n). Most modern factorization algorithms take time given by the following subexponential expression in ln n:

L(n, α, c) := exp((c + o(1))(ln n)α(ln ln n)1–α),

where 0 < α < 1 and c > 0 are constants. As described in Section 3.2, the smaller the value of α is, the closer the expression L(n, α, c) is to a polynomial expression (in ln n). If n is understood from the context, we write L(α, c) in place of L(n, α, c). Although the current best-known algorithms correspond to α = 1/3, the algorithms with α = 1/2 are also quite interesting. In this case, we use the shorter notation L[c] := L(1/2, c).

Henceforth we will use, without explicit mention, the notation q1 := 2, q2 := 3, q3 := 5, . . . to denote the sequence of primes. The concept of qt-smoothness (for some ) will often be referred to as B-smoothness, where B = {q1, . . . , qt}. Recall from Theorem 2.21 that smaller integers have higher probability of being B-smooth for a given B. This observation plays an important role in designing integer factoring algorithms. The following special case of Theorem 2.21 is often useful.

Corollary 4.1.

Let , x = O(nα) and y = L[β] = L(n, 1/2, β). Then we have the asymptotic formula .

Before any attempt of factoring n is made, it is worthwhile to check for the primality of n. Since probabilistic primality tests (like Algorithm 3.13) are quite efficient, we should first run one such test before we are sure that n is really composite. Henceforth, we will assume that n is known to be composite.

4.3.1. Older Algorithms

“Factoring in the dark ages” (a phrase attributed to Hendrik Lenstra) used fully exponential algorithms some of which are discussed now. Though the worst-case performances of these algorithms are quite poor, there are many situations when they might factor even a large integer quite fast. It is, therefore, worthwhile to spend some time on these algorithms.

Trial division

A composite integer n admits a factor ≤ , that can be found by trial divisions of n by integers ≤ . This demands trial divisions and is clearly impractical, even when n contains only 30 decimal digits. It is also true that n has a prime divisor ≤ . So it suffices to carry out trial divisions by primes only. Though this modified strategy saves us many unnecessary divisions, the asymptotic complexity does not reduce much, since by the prime number theorem the number of primes ≤ is about . In addition, we need to have a list of primes ≤ or generate the primes on the fly, neither of which is really practical. A trade-off can be made by noting that an integer m ≥ 30 cannot be prime unless m ≡ 1, 7, 11, 13, 17, 19, 23, 29 (mod 30). This means that we need to perform the trial divisions only by those integers m congruent to one of these values modulo 30 and this reduces the number of trial divisions to about 25 per cent. Though trial division is not a practical general-purpose algorithm for factoring large integers, we recommend extracting all the small prime factors of n, if any, by dividing n by a predetermined set {q1, . . . , qt} of small primes. If n is indeed qt-smooth or has all prime factors ≤ qt except only one, then the trial division method completely factors n quite fast. Even when n is not of this type, trial division might reduce its size, so that other algorithms run somewhat more efficiently.

Pollard’s rho method

Pollard’s rho method solves the IFP in an expected O~(n1/4) time and is based on the birthday paradox (Exercise 2.172).

Let be an (unknown) prime divisor of n and let be a random map. We start with an initial value and generate a sequence xi+1 = f(xi), , of elements of . Let yi denote the smallest non-negative integer satisfying yixi (mod p). By the birthday paradox, after iterates x1, . . . , xt are generated, we have a high chance that yi = yj, that is, xixj (mod p) for some 1 ≤ i < jt. This means that p|(xixj) and computing gcd(xixj, n) splits n into two non-trivial factors with high probability. The method fails if this gcd is n. For a random n, this incident of having a gcd equal to n is of very low probability.

Algorithm 4.1 gives a specific implementation of this method. Computing gcds for all the pairs (xixj, n) is a massive investment of time. Instead we store (in the variable ξ) the values xr, r = 2t, for and compute only gcd(xr+sxr, n) for s = 1, . . . , r. Since the sequence yi, , is ultimately periodic with expected length of period , we eventually reach a t with r = 2t ≥ τ. In that case, the for loop detects a match. Typically, the update function f is taken to be f(x) = x2 –1 (mod n), which, though not a random function, behaves like one. Note that the iterates yi, , may be visualized as being located on the Greek letter ρ as shown in Figure 4.1 (with a tail of the first μ iterates followed by a cycle of length τ). This is how this method derives its name.

Figure 4.1. Iterates in Pollard’s rho method


Algorithm 4.1 takes an expected running time . Since , Pollard’s rho method runs in expected time .

Algorithm 4.1. Pollard’s rho method

Input: A composite integer .

Output: A non-trivial factor of n.

Steps:

Choose a random element and set ξ := x and r := 1.

while (1) {
   for s = 1, . . . , r {
       x := f(x).
       d := gcd(x – ξ, n).
       if (1 < d < n) { Return d. }
   }
   ξ := x.
   r := 2r.
}

Many modifications of Pollard’s rho method have been proposed in the literature. Perhaps the most notable one is an idea due to R. P. Brent. All these modifications considerably speed up Algorithm 4.1, though leaving the complexity essentially the same, that is, . We will not describe these modifications in this book.

Pollard’s p – 1 method

Pollard’s p – 1 method is dependent on the prime factors of p – 1 for a prime divisor p of n. Indeed if p – 1 is rather smooth, this method may extract a (non-trivial) factor of n pretty fast, even when p itself is quite large. To start with we extend the definition of smoothness as follows.

Definition 4.1.

Let . An integer x is called y-power-smooth if, whenever a prime power pe divides x, we have pey. Clearly, a y-power-smooth integer is y-smooth, but not necessarily conversely.

Let p be an (unknown) prime divisor of n. We may assume, without loss of generality, that . Assume that p–1 is M-power-smooth. Then (p – 1)| lcm(1, . . . , M) and, therefore, for an integer a with gcd(a, n) = 1 (and hence with gcd(a, p) = 1), we have alcm(1,...,M) ≡ 1 (mod p) by Fermat’s little theorem, that is, d := gcd(alcm(1,...,M) – 1, n) > 1. If dn, then d is a non-trivial factor of n. In case we have d = n (a very rare occurrence), we may try with another a or declare failure.

The problem with this method is that p and so M are not known in advance. One may proceed by guessing successively increasing values of M, till the method succeeds. In the worst case, that is, when p is a safe prime, we have M = (p – 1)/2. Since , this algorithm runs in a worst-case time of . However, if M is quite small, then this algorithm is rather efficient, irrespective of how large p itself is.

In Algorithm 4.2, we give a variant of the p – 1 method, where we supply a predetermined value of the bound M. We also assume that we have at our disposal a precalculated list of all primes q1, . . . , qtM.

There is a modification of this algorithm known as Stage 2 or the second stage. For this, we choose a second bound M′ larger than M. Assume that p – 1 = rq, where r is M-power-smooth and q is a prime in the range M < qM′. In this case, Stage 2 computes with high probability a factor of n after doing an operations as follows. When Algorithm 4.2 returns “failure” at the last step, it has already computed the value A := am (mod n), where , ei = ⌊ln M/ln qi⌋. In this case, A has the multiplicative order of q modulo p, that is, the subgroup H of generated by A has order q. We choose random integers . By the birthday paradox (Exercise 2.172), we have with high probability AliAlj (mod p) for some ij. In that case, d := gcd(Ali – Alj, n) is divisible by p and is a desired factor of n (unless d = n, a case that occurs with a very low probability). In practice, we do not know q and so we determine s and the integers l1, . . . , ls using the bound M′ instead of q.

Algorithm 4.2. Pollard’s p – 1 method

Input: A composite integer , a bound M and all primes q1, . . . , qtM.

Output: A non-trivial factor d of n or “failure”.

Steps:

Select a random integer a, 1 < a < n. /* For example, we may take a := 2 */

if (d := gcd(an) ≠ 1) { Return d. }
for i = 1, . . . , t {
    ei := ⌊ln M/ln qi⌋.
    .
    d := gcd(a – 1, n)
    if (1 < d < n) { Return d. }
    if (d = n) { Return “failure”. }  /* Or repeat the for loop with another a */
    if (d = 1) { Return “failure”. }
}
Return “failure”.

In another variant of Stage 2, we compute the powers Aqt+1 , . . . , Aqt (mod n), where qt+1, . . . , qt are all the primes qj satisfying M < qjM′. If p – 1 = rq is of the desired form, we would find q = qj for some t < jt′, and then gcd(Aq – 1, n), if not equal to n, would be a non-trivial factor of n.

In practice, one may try one’s luck using this algorithm for some M in the range 105M ≤ 106 (and possibly also the second stage with 106M′ ≤ 108) before attempting a more sophisticated algorithm like the MPQSM, the ECM or the NFSM.

Williams’ p + 1 method

As always, we assume that n is a composite integer and that p is an (unknown) prime divisor of n. Pollard’s p – 1 method uses an element a in the group whose multiplicative order is p – 1. The idea of Williams’ p + 1 method is very similar, that is, it works with an element a, this time in , whose multiplicative order is p + 1. If p + 1 is M-power-smooth for a reasonably small bound M, then computing d := gcd(ap+1 – 1, n) > 1 splits n with high probability.

In order to find an element of order p + 1, we proceed as follows. Let α be an integer such that α2 – 4 is a quadratic non-residue modulo p. Then the polynomial is irreducible and . Let a, be the two roots of f. Then ab = 1 and a + b = α. Since f(ap) = 0 (check it!) and since , we have ap = b = a–1, that is, ap+1 = 1.

Unfortunately, p is not known in advance. Therefore, we represent elements of as integers modulo n and the elements of as polynomials c0 + c1X with c0, . Multiplying two such elements of is accomplished by multiplying the two polynomials representing these elements modulo the defining polynomial f(X), the coefficient arithmetic being that of . This gives us a way to do exponentiations in in order to compute am – 1 for a suitable m (for example, m = lcm(1, . . . , M)).

However, the absence of knowledge of p has a graver consequence, namely, it is impossible to decide whether α2 – 4 is a quadratic non-residue modulo p for a given integer α. The only thing we can do is to try several random values of α. This is justified, because if k random integers α are tried, then the probability that for all of these α the integers α2 – 4 are quadratic residues modulo p is only 1/2k.

The code for the p + 1 method is very similar to Algorithm 4.2. We urge the reader to complete the details. Since p3 – 1 = (p – 1)(p2 + p + 1), p4 – 1 = (p2 – 1)(p2 + 1) and so on, we can work in higher extensions like , to find elements of order p2 + p + 1, p2 + 1 and so on, and generalize the p ± 1 methods. However, the integers p2 + p + 1, p2 + 1, being large (compared to p ± 1), have smaller chance of being M-smooth (or M-power-smooth) for a given bound M.

The reader should have recognized why we paid attention to strong primes and safe primes (Definition 3.5, p 199, and Algorithm 3.14, p 200). Let us now concentrate on the recent developments in the IFP arena.

4.3.2. The Quadratic Sieve Method

Carl Pomerance’s quadratic sieve method (QSM) is one of the (reasonably) successful modern methods of factoring integers. Though the number field sieve factoring method is the current champion, there was a time in the recent past when the quadratic sieve method and the elliptic curve method were known to be the fastest algorithms for solving the IFP.

The basic algorithm

We assume that n is a composite integer which is not a perfect square (because it is easy to detect if n is a perfect square and if so, we replace n by ). The basic idea is to reach at a congruence of the form

Equation 4.1


with x ≢ ±y (mod n). In that case, gcd(xy, n) is a non-trivial factor of n.

We start with a factor base B = {q1, . . . , qt} comprising the first t primes and let and J := H2n. Then H and J are each and hence for a small integer c the right side of the congruence

(H + c)2J + 2cH + c2 (mod n)

is also . We try to factor T(c) := J + 2cH + c2 using trial divisions by elements of B. If the factorization is successful, that is, if T(c) is B-smooth, then we get a relation of the form

Equation 4.2


where . (Note that T(c) ≠ 0, since n is assumed not to be a perfect square.) If all αi are even, say, αi = 2βi, then we get the desired Congruence (4.1) with and y = H + c. But this is rarely the case. So we keep on generating other relations. After sufficiently many relations are available, we combine these together (by multiplication) to get Congruence (4.1) and compute gcd(xy, n). If this does not give a non-trivial factor, we try to recombine the collected relations in order to get another Congruence (4.1). This is how Pomerance’s QSM works.

In order to find suitable combinations for yielding Congruence (4.1), we employ a method similar to Gaussian elimination. Assume that we have collected r relations of the form

We search for integers such that the product

is a desired Congruence (4.1). The left side of this congruence is already a square. In order to make the right side a square too, we have to essentially solve the following system of linear congruences modulo 2:

This is a system of t equations over in r unknowns β1, . . . , βr and is expected to have solutions, if r is slightly larger than t. Note that the values of αij modulo 2 are only needed for solving the above linear system. This means that we can have a compact representation of the coefficient matrix (αij) by packing 32 of the coefficients as bits per word. Gaussian elimination (over ) can be done using bit operations only.

The running time of this method can be derived using Corollary 4.1. Note that the integers T(c) that are tested for B-smoothness are O(n1/2) which corresponds to α = 1/2 in the corollary. We take qt = L[1/2] (so that t = L[1/2]/ ln L[1/2] = L[1/2] by the prime number theorem) which corresponds to β = 1/2. Assuming that the integers T(c) behave as random integers of magnitude O(n1/2), the probability that one such T(c) is B-smooth is L[–1/2]. Therefore, if L[1] values of c are tried, we expect to get L[1/2] relations involving the L[1/2] primes q1, . . . , qt. Combining these relations by Gaussian elimination is now expected to produce a non-trivial Congruence (4.1). This gives us a running-time of the order of L[3/2] for the relation collection stage. Gaussian elimination using L[1/2] unknowns also takes asymptotically the same time. However, each T(c) can have at most O(log n) distinct prime factors, implying that Relation (4.2) is necessarily sparse. This sparsity can be effectively exploited and the Gaussian elimination can be done essentially in time L[1]. Nevertheless, the entire procedure runs in time L[3/2], a subexponential expression in ln n.

Sieving

In order to reduce the running time from L[3/2] to L[1], we employ what is known as sieving (and from which the algorithm derives its name). Let us fix a priori the sieving interval, that is, the values of c for which T(c) is tested for B-smoothness, to be –McM, where M = L[1]. Let be a small prime (that is, q = qi for some i = 1, . . . , t). We intend to find out the values of c such that qh|T(c) for small exponents h = 1, 2, . . . . Since T(c) = J + 2cH + c2 = (c + H)2n, the solvability for c of the condition qh|T(c) or of q|T(c) is equivalent to the solvability of the congruence (c + H)2n (mod q). If n is a quadratic non-residue modulo q, no c satisfies the above condition. Consequently, the factor base B may comprise only those primes q for which n is a quadratic residue modulo q (instead of all primes ≤ qt). So we assume that q meets this condition. We may also assume that qn, because it is a good strategy to perform trial divisions of n by all the primes in B before we go for sieving. The sieving process makes use of an array indexed by c. We initialize the array location for each c, –McM.

We explain the sieving process only for an odd prime q. The modifications for the case q = 2 are left to the reader as an easy exercise. The congruence x2n ≡ 0 (mod q) has two distinct solutions for x, say, x1 and mod q. These correspond to two solutions for c of (H + c)2n (mod q), namely, c1x1H (mod q) and (mod q). For each value of c in the interval –McM, that is congruent either to c1 or modulo q, we subtract ln q from . We then lift the solutions x1 and to the (unique) solutions x2 and of the congruence x2n ≡ 0 (mod q2) (Exercise 3.29), compute c2x2H (mod q2) and (mod q2) and for each c in the range –McM congruent to c2 or modulo q2 subtract ln q from . We then again lift to obtain the solutions modulo q3 and proceed as above. We repeat this process of lifting and subtracting ln q from appropriate locations of until we reach a sufficiently large for which neither ch nor corresponds to any value of c in the range –McM. We then choose another q from the factor base and repeat the procedure explained in this paragraph for this q.

After the sieving procedure is carried out for all small primes q in the factor base B, we check for which c, –McM, the array location is 0. These are precisely the values of c in the indicated range for which T(c) is B-smooth. For each smooth T(c), we then compute Relation (4.2) using trial division (by primes of B).

The sieving process replaces trial divisions (of every T(c) by every q) by subtractions (of ln q from appropriate ). This is intuitively the reason why sieving speeds up the relation collection stage. For a more rigorous analysis of the running time, note that in order to get the desired ci and modulo qi for each and for each i = 1, . . . , h we have either to compute a square root modulo q (for i = 1) or to solve a congruence (during lifting for i ≥ 2), each of which can be done in polynomial time. Also the bound h on the exponent of q satisfy , that is, h = O(log n). Finally, there are L[1/2] primes in B. Therefore, the computation of the ci and for all q and i takes a total of L[1/2] time.

Now, we count the total number ν of subtractions of different ln q values from all the locations of the array . The size of is 2M + 1. For each qi, we need to subtract ln q from at most 2 ⌈(2M + 1)/qi⌉ locations (for odd q), and we also have . Therefore, ν is of the order of , where Q is the maximum of all qi and is , and where Hm, , denote the harmonic numbers (Exercise 4.6). But Hm = O(ln m), and so ν = O(2(2M + 1) log n) = L[1], since M = L[1].

The logarithms ln q (as well as the initial array values ln |T(c)|) are irrational numbers and hence need infinite precision for storing. We, however, need to work with only crude approximations of these logarithms, say up to three places after the decimal point. In that case, we cannot take as the criterion for selecting smooth values of T(c), because the approximate representation of logarithms leads to truncation (and/or rounding) errors. In practice, this is not a severe problem, because T(c) is not smooth if and only if it has a prime factor at least as large as qt+1 (the smallest prime not in B). This implies that at the end of the sieving operation the values of for smooth T(c) are close to 0, whereas those for non-smooth T(c) are much larger (close to a number at least as large as ln qt+1). Thus we may set the selection criterion for smooth integers as or as ln qt+1. It is also possible to replace floating point subtraction by integer subtraction by doing the arithmetic on 1000 times the logarithm values. To sum up, the ν = L[1] subtractions the sieving procedure does would be only single-precision operations and hence take a total of L[1] time.

As mentioned earlier, Gaussian elimination with sparse equations can also be performed in time L[1]. So Pomerance’s algorithm with sieving takes time L[1].

Incomplete sieving

Numerous modifications over this basic strategy speed up the algorithm reasonably. One possibility is to do sieving every time only for h = 1 and ignore all higher powers of q. That is, for every q we check which of the integers T(c) are divisible by q and then subtract ln q from the corresponding indices of the array . If some T(c) is divisible by a higher power of q, this strategy fails to subtract ln q the required number of times. As a result, this T(c), even if smooth, may fail to pass the smoothness criterion. This problem can be overcome by increasing the cut-off from 1 (or 0.1 ln qt+1) to a value ξ ln qt for some ξ ≥ 1. But then some non-smooth T(c) will pass through the selection criterion in addition to some smooth ones that could not, otherwise, be detected. This is reasonable, because the non-smooth ones can be later filtered out from the smooth ones and one might use even trial divisions to do so. Experimentations show that values of ξ ≤ 2.5 work quite well in practice.

The reason why this strategy performs well is as follows. If q is small, for example q = 2, we should subtract only 0.693 from for every power of 2 dividing T(c). On the other hand, if q is much larger, say q = 1,299,709 (the 105-th prime), then ln q ≈ 14.078 is large. But T(c) would not, in general, be divisible by a high power of this q. This modification, therefore, leads to a situation where the probability that a smooth T(c) is actually detected as smooth is quite high. A few relations would still be missed out even with the modified selection criterion, but that is more than compensated by the speed-up gained by the method. Henceforth, we will call this modified strategy as incomplete sieving and the original strategy (of considering all powers of q) as complete sieving.

Large prime variation

Another trick known as large prime variation also tends to give more usable relations than are available from the original (complete or incomplete) sieving. In this context, we call a prime qlarge, if q′ ∉ B. A value of T(c) is often expected to be B-smooth except for a single large prime factor:

Equation 4.3


with q′ ∉ B. Such a value of T(c) can be easily detected. For example, incomplete sieving with the relaxed selection criterion is expected to give many such relations naturally, whereas for complete sieving, if the left-over of ln |T(c)| in at the end of the subtraction steps is < 2 ln qt, then this must correspond to a large prime factor < . Instead of throwing away an apparently unusable Equation (4.3), we may keep track of them. If a large prime q′ is not large enough (that is, not much larger than qt), then it might appear on the right side of Equation (4.3) for more than one values of c, and if that is the case, all these relations taken together now become usable for the subsequent Gaussian elimination stage (after including q′ in the factor base). This means that for each large prime occurring more frequently than once, the factor base size increases by 1, whereas the number of relations increases by at least 2. Thus with a little additional effort we enrich the factor base and the relations collected, and this, in turn, increases the probability of finding a useful Congruence (4.1), our ultimate goal. Viewed from another angle, the strategy of large prime variation allows us to start with smaller values of t and/or M and thereby speed up the sieving stage and still end up with a system capable of yielding the desired Congruence (4.1). Note that an increased factor base size leads to a larger system to solve by Gaussian elimination. But this is not a serious problem in practice, because the sieving stage (and not the Gaussian elimination stage) is usually the bottleneck of the running time of the algorithm.

It is natural that the above discussion on handling one large prime is applicable to situations where a T(c) value has more than one large prime factors, say q′ and q″. Such a T(c) value leads to a usable relation if . This situation can be detected by a compositeness test on the non-smooth part of T(c). Subsequently, we have to factor the non-smooth part to obtain the two large primes q′ and q″. This is called two large prime variation. As the size of the integer n to be factored becomes larger, one may go for three and four large prime variations.

We will shortly encounter many other instances of sieving (for solving the IFP and the DLP). Both incomplete sieving and the use of large primes, if carefully applied, help speed up most of these sieving methods much in the same way as they do in connection with the QSM.

The multiple polynomial quadratic sieve

Easy computations (Exercise 4.11) show that the average and maximum of the integers |T(c)| checked for smoothness in the QSM are approximately M H and 2M H respectively. Though these values are theoretically , in practice the factor of M (or 2M) makes the integers |T(c)| somewhat large leading to a poor yield of B-smooth integers for larger values of |c| in the sieving interval. The multiple-polynomial quadratic sieve method (MPQSM) applies a nice trick to reduce these average and maximum values. In the original QSM, we work with a single polynomial in c, namely,

T(c) = J + 2cH + c2 = (H + c)2n.

Now, we work with a more general quadratic polynomial

with W > 0 and V2UW = n. (The original T(c) corresponds to U = J, V = H and W = 1.) Then we have , that is, in this case a relation looks like

This relation has an additional factor of W that was absent in Relation (4.2). However, if W is chosen to be a prime (possibly a large one), then the Gaussian elimination stage proceeds exactly as in the original method. Indeed in this case W appears in every relation and hence poses no problem. Only the integers need to be checked for B-smoothness and hence should have small values. The sieving procedure (that is, computing the appropriate locations of for subtracting ln q, ) for the general polynomial is very much similar to that for T(c). The details are left to the reader as an easy exercise.

Let us now explain how we can choose the parameters U, V, W. To start with we fix a suitable sieving interval and then choose W to be a prime close to such that n is a quadratic residue modulo W. Then we compute a square root V of n modulo W (Algorithm 3.16) and finally take U = (V2n)/W. This choice clearly gives and . (Indeed one may choose 0 < V < W/2, but this is not an important issue.) Now, the maximum value of becomes . Thus even for , this maximum value is smaller by a factor of than the maximum value of |T(c)| in the original QSM. Moreover, we may choose somewhat smaller values of (compared to M) by working with several polynomials corresponding to different choices for the prime W. This is why the MPQSM, despite having the same theoretical running-time (L[1]) as the original QSM, runs faster in practice.

Parallelization

The QSM is highly parallelizable. More specifically, different processors can handle pairwise disjoint subsets of B during the sieving process. That is, each processor P maintains a local array indexed by c, –McM. The (local) sieving process at P starts with initializing all the locations to 0. For each prime q in the subset BP of the factor base B assigned to P, one adds ln q to appropriate locations (and appropriate numbers of times). After all these processors finish local sieving, a central processor computes, for each c in the sieving interval, the value ln (where the sum extends over all processors P which have done local sieving) based on which T(c) is recognized as smooth or not. For the multiple-polynomial variant of the QSM, different processors might handle different polynomials and/or different subsets of B.

TWINKLE: Shamir’s factoring device

Adi Shamir has proposed the complete design of a (hardware) device, TWINKLE (The Weizmann INstitute Key Location Engine), that can perform the sieving stage of QSM a hundred to thousand times faster than software implementations in usual PCs available nowadays. This speed-up is obtained by using a high clock speed (10 GHz) and opto-electronic technology for detecting smooth integers. Each TWINKLE, if mass produced, has an estimated cost of US $5,000.

The working of TWINKLE is described in Figure 4.2. It uses an opaque cylinder of a height of about 10 inches and a diameter of about 6 inches. At the bottom of the cylinder is an array of LEDs,[1] each LED representing a prime in the factor base. The i-th LED (corresponding to the i-th prime qi) emits light of intensity proportional to log qi. The device is clocked and the i-th LED emits light only during the clock cycles c for which qi|T(c). The light emitted by all the active LEDs at a given clock cycle is focused by a lens and a photo-detector senses the total emitted light. If this total light exceeds a certain threshold, the corresponding clock cycle (that is, the time c) is reported to a PC attached to TWINKLE. The PC then analyses the particular T(c) for smoothness over {q1, . . . , qt} by trial division.

[1] An LED (light emitting diode) is an electronic device that emits light, when current passes through it. A GaAs(Gallium arsenide)-based LED emits (infra-red) light of wavelength ~870 nano-meters. In the operational range of an LED, the intensity of emitted light is roughly proportional to the current passing through the LED.

Figure 4.2. Working of TWINKLE


Thus, TWINKLE implements incomplete sieving by opto-electronic means. The major difference between TWINKLE’s sieving and software sieving is that in the latter we used an array of times (the c values) and the iteration went over the set of small primes. In TWINKLE, we use an array of small primes and allow time to iterate over the different values of c in the sieving interval –McM. An electronic circuit in TWINKLE computes for each LED the cycles c at which that LED is expected to emanate light. That is to say that the i-th LED emits light only in the clock cycles c congruent modulo qi to any of the two solutions c1 and of T(c) ≡ 0 (mod qi). Shamir’s original design uses two LEDs for each prime qi, one corresponding to c1, the other to . In that case, each LED emits light at regularly spaced clock cycles and this simplifies the electronic circuitry (at the cost of having twice the number of LEDs).

Another difference of TWINKLE from software sieving is that here we add the log qi values (to zero) instead of subtracting them from log |T(c)|. By Exercise 4.11, the values |T(c)| typically have variations by small constant factors. Taking logs reduces this variation further and, therefore, comparing the sum of the active log qi values for a given c with a fixed predefined threshold (say log M H) independent of c is a neat way of bypassing the computation of all log |T(c)|, –McM. (This strategy can also be used for software sieving.)

The reasons, why TWINKLE speeds up the sieving procedure over software implementations in conventional PCs, are the following:

  1. Silicon-based PC chips at present can withstand clock frequencies on the order of 1 GHz. On the contrary a GaAs-based wafer containing the LED array can be clocked faster than 10 GHz.

  2. There is no need to initialize the array (to log |T(c)| or zero). Similarly at the end, there is no need to compare the final values in all these array locations with a threshold.

  3. The addition of all the log qi values effective at a given c is done instantly by analog optical means. We do not require an explicit electronic adder.

Shamir [269] reports the full details of a VLSI[2] design of TWINKLE.

[2] very large-scale integration

*4.3.3. Factorization Using Elliptic Curves

H. W. Lenstra’s elliptic curve method (ECM) is another modern algorithm to solve the IFP and runs in expected time , where p is the smallest prime factor of n (the integer to be factored). Since , this running time is L[1] = L(n, 1/2, 1): that is, the same as the QSM. However, if p is small (that is, if p = O(nα) for some α < 1/2), then the ECM is expected to outperform the QSM, since the working of the QSM is incapable of exploiting smaller values of p.

As before, let n be a composite natural number having no small prime divisors and let p be the smallest prime divisor of n. For denoting subexponential expressions in ln p, we use the symbol Lp[c] := L(p, 1/2, c), whereas the unsubscripted symbol L[c] stands for L(n, 1/2, c). We work with random elliptic curves

and consider the group of rational points on E modulo p. However, since p is not known a priori, we intend to work modulo n. The canonical surjection allows us to identify the -rational points on E as points on E over . We now define a bound and let B = {q1, . . . , qt} be all the primes smaller than or equal to M, so that by the prime number theorem (Theorem 2.20) #BM/ln . Of course, p is not known in advance, so that M and B are also not known. We will discuss about the choice of M and B later. For the time being, let us assume that we know some approximate value of p, so that M and B can be fixed, at least approximately, ate the beginning of the algorithm.

By Hasse’s theorem (Theorem 2.48, p 106), the cardinality satisfies , that is, ν = O(p). If we make the heuristic assumption that ν is a random integer on the order O(p), then Corollary 4.1 tells us that ν is B-smooth with probability . This assumption is certainly not rigorous, but accepting it gives us a way to analyse the running time of the algorithm.

If random curves are tried, then we expect to find one B-smooth value of ν. In this case, a non-trivial factor of n can be computed with high probability as follows. Define ei := ⌊ln n/ln qi⌋ for i = 1, . . . , t, and , where t is the number of primes in B. If ν is B-smooth, then ν|m and, therefore, for any point we have . Computation of mP involves computation of many sums P1 + P2 of points P1 := (h1, k1) and P2 := (h2, k2). At some point of time, we would certainly compute , that is, P1 = –P2, that is, h1h2 (mod p) and k1 ≡ –k2 (mod p). Since p was unknown, we worked modulo n, that is, the values of h1, h2, k1 and k2 are known modulo n. Let d := gcd(h1h2, n). Then p|d and if dn (the case d = n has a very small probability!), we have the non-trivial factor d of n. The computation of the coordinates of P1 + P2 (assuming P1P2) demands computing the inverse of h1h2 modulo n (Section 2.11.2). However, if d = gcd(h1h2, n) ≠ 1, then this inverse does not exist and so the computation of P1 + P2 fails, and we have a non-trivial factor of n. If ν is B-smooth, then the computation of mP is bound to fail. The basic steps of the ECM are then as shown in Algorithm 4.3.

Algorithm 4.3. Elliptic curve method (ECM)

Input: A composite integer (with no small prime factors).

Output: A non-trivial divisor d of n.

Steps:

while (1) {
   Select a random curve E : Y2 = X3 + aX + b modulo n.
   Choose a point  in .
   Try to compute mP.   /* where m is as defined in the text */
   if (the computation of mP fails) {
       /* We have found a divisor d > 1 of n */
       if (d ≠ n) { Return d. }
   }
}

Before we derive the running time of the ECM, some comments are in order. A random curve E is chosen by selecting random integers a and b modulo n. It turns out that taking a as single-precision integers and b = 1 works quite well in practice. Indeed one can keep on trying the values a = 0, 1, 2, . . . successively. Note that the curve E is an elliptic curve, that is, non-singular, if and only if δ := gcd(n, 4a3 + 27b2) = 1. However, having δ > 1 is an extremely rare occurrence and one might skip the computation of δ before starting the trial with a curve. The choice b = 1 is attractive, because in that case we may take the point P = (0, 1). In Section 3.6, we have described a strategy to find a random point on an elliptic curve over a field K. This is based on the assumption that computing square roots in K is easy. The same method can be applied to curves over , but n being composite, it is difficult to compute square roots modulo n. So taking b to be 1 (or the square of a known integer) is indeed a pragmatic decision. After all, we do not need P to be a random point on E.

Recall that we have taken , where ei = ⌊ln n/ln qi⌋. If instead we take ei := ⌊ln M/ln qi⌋ (where M is the bound mentioned earlier), the computation of mP per trial reduces much, whereas the probability of a successful trial (that is, a failure of computing mP) does not decrease much. The integer m can be quite large. One, however, need not compute m explicitly, but proceed as follows: first take Q0 := P and subsequently for each i = 1, . . . , t compute . One finally gets mP = Qt.

Now comes the analysis of the running time of the ECM. We have fixed the parameter M to be , so that B contains small primes. The most expensive part of a trial with a random elliptic curve is the (attempted) computation of the point mP. This involves additions of points. Since an expected number of elliptic curves needs to be tried for finding a non-trivial factor of n, the algorithm performs an expected number of additions of points on curves modulo n. Since each such addition can be done in polynomial time, the announced running time follows.

Note that is the optimal running time of the ECM and can be shown to be achieved by taking . But, in practice, p is not known a priori. Various ad hoc ways may be adopted to get around with this difficulty. One possibility is to use the worst-case bound . For example, for factoring integers of the form n = pq, where p and q are primes of roughly the same size, this is a good approximation for p. Another strategy is to start with a small value of M and increase M gradually with the number of trials performed. For larger values of M, the probability of a successful trial increases implying that less number of elliptic curves needs to be tried, whereas the time per iteration (that is, for the computation of mP) increases. In other words, the total running time of the ECM is apparently not very sensitive to the choice of M.

A second stage can be used for each elliptic curve in order to increase the probability of a trial being successful. A strategy very similar to the second stage of the p – 1 method can be employed. The reader is urged to fill out the details. Employing the second stage leads to reasonable speed-up in practice, though it does not affect the asymptotic running time.

The ECM can be effectively parallelized, since different processors can carry out the trials, that is, computations of mP (together with the second stage) with different sets of (random) elliptic curves.

**4.3.4. The Number Field Sieve Method

The number field sieve method (NFSM) is till date the most successful of all integer factoring algorithms. Under certain heuristic assumptions it achieves a running time of the form L(n, 1/3, c), which is better than the L(n, 1/2, c′) algorithms described so far. The NFSM was first designed for integers of a special form. This variant of the NFSM is called the special NFS method (SNFSM) and was later modified to the general NFS method (GNFSM) that can handle arbitrary integers. The running time of the SNFSM has c = (32/9)1/3 ≈ 1.526, whereas that for the GNFSM has c = (64/9)1/3 ≈ 1.923. For the sake of simplicity, we describe only the SNFSM in this book (see Cohen [56] and Lenstra and Lenstra [165] for further details).

We choose an integer and a polynomial such that f(m) ≡ 0 (mod n). We assume that f is irreducible in ; otherwise a non-trivial factor of f yields a non-trivial factor of n. Consider the number field . Let d := deg f be the degree of the number field K. We use the complex embedding for some root of f. The special NFS method makes certain simplifying assumptions:

  1. f is monic, so that .

  2. is monogenic.

  3. is a PID.

Consider the ring homomorphism

This is well-defined, since f(m) ≡ 0 (mod n). We choose small coprime (rational) integers a, b and note that Φ(a+bα) = a + bm (mod n). Let be a predetermined smoothness bound. Assume that for a given pair (a, b), both a + bm and a + bα are B-smooth. For the rational integer a + bm, this means

being the set of all rational primes ≤ B. On the other hand, smoothness of the algebraic integer a + bα means that the principal ideal is a product of prime ideals of prime norms ≤ B; that is, we have a factorization

where is the set of all prime ideals of of prime norms ≤ B. By assumption, each is a principal ideal. Let denote a set of generators, one for each ideal in . Further let denote a set of generators of the multiplicative group of units of . The smoothness of a + bα can, therefore, be rephrased as

Equation 4.4


Applying Φ then yields

This is a relation for the SNFSM. After relations are available, Gaussian elimination modulo 2 (as in the case of the QSM) is expected to give us a congruence of the form

x2y2 (mod n),

and gcd(xy, n) is possibly a non-trivial factor of n. This is the basic strategy of the SNFSM. We clarify some details now.

Selecting the polynomial f(X)

There is no clearly specified way to select the polynomial f for defining the number field . We require f to have small coefficients. Typically, m is much smaller than n and one writes the expansion of n in base m as n = btmt + bt–1mt–1 + ··· + b1m + b0 with 0 ≤ bi < m. Taking f(X) = btXt + bt–1Xt–1 + ··· + b1X + b0 is often suggested.

For integers n of certain special forms, we have natural choices for f. The seminal paper on the NFSM by Lenstra et al. [167] assumes that n = res for a small integer and a non-zero integer s with small absolute value. In this case, one first chooses a small extension degree and sets m := re/d and f(X) := Xdsre/dde. Typically, d = 5 works quite well in practice. Lenstra et al. report the implementation of the SNFSM for factoring n = 3239 – 1. The parameters chosen are d = 5, m = 348 and f(X) = X5 – 3. In this case, is monogenic and a PID.

Construction of

Take a small rational prime . From Section 2.13, it follows that if is the factorization of the canonical image of f(X) modulo p, then , i = 1, . . . , r, are all the primes lying over p. We have also seen that , , is prime if and only if di = 1, that is, for some . Thus, each root of in corresponds to a prime ideal of of prime norm p.

To sum up, a prime ideal in of prime norm is specified by a pair (p, cp) of values (in ). We denote this ideal by . All ideals in can be precomputed by finding the roots of the defining polynomial f(X) modulo the small primes pB. One can use the root-finding algorithms of Exercise 3.29.

Construction of

Constructing a set of generators of ideals in is a costly operation. We have just seen that each prime ideal of corresponds to a pair (p, cp) and is a principal ideal by assumption. A generator of such an ideal is an element of the form , , with N(gp,cp) = ±p and (mod p). Algorithm 4.4 (quoted from Lenstra et al. [167]) computes the generators gp,cp for all relevant pairs (p, cp). The first for loop exhaustively searches over all small polynomials h(α) in order to locate for each (p, cp) an element of norm kp with |k| as small as possible. If the smallest k (stored in ap,cp) is ±1, is already a generator gp,cp of , else some additional adjustments need to be performed.

Algorithm 4.4. Construction of generators of ideals for the SNFSM

Choose two suitable positive constants aB and CB (depending on B and K).

Initialize an array ap,cp := aB indexed by the relevant pairs (p, cp).

for each with , N(h) = kpp ≤ B,
     \ {0}, |k| < min(paB) {
    Find cp such that .    /* Root finding */
    if (|k| < |ap,cp|) {
       /* Store the least k and the corresponding h found so far */
       ap,cp := k.
    }
}
for each relevant pair (pcp) {
    if (ap,cp = ±1)    /* The more frequent case */
    else {
       Locate a  with N(g) = ap,cp.
       .
    }
}

Construction of

Let K have the signature (r1, r2). Write ρ = r1 + r2 – 1. By Dirichlet’s unit theorem, the group of units of is generated by an appropriate root u0 of unity and ρ multiplicatively independent[3] elements u1, . . . , uρ of infinite order. Each unit u of has norm N(u) = ±1. Thus, one may keep on generating elements , hi small integers, of norm ±1, until ρ independent elements are found. Many elements of are available as a by-product during the construction of , which involves the computation of norms of many elements in . For a more general exposition on this topic, see Algorithm 6.5.9 of Cohen [56].

[3] The elements u1, . . . , uρ in a (multiplicatively written) group are called (multiplicatively) independent if , , is the group identity only for n1 = ··· = nρ = 0.

Computing the factorization of a + bα

In order to compute the factorization of Equation (4.4), we first factor the integer N(a + bα) = bdf(–a/b). If is the prime factorization of 〈a + bα〉 with pairwise distinct prime ideals of , by the multiplicative property of norms we obtain .

Now, let pB be a small prime. If pN(a + bα), it is clear that no prime ideal of of norm p (or a power of p) appears in the factorization of 〈a + bα〉. On the other hand, if p| N(a + bα), then for some . The assumption implies that the inertial degree of is 1: that is, , that is, , that is, there is a cp with f(cp) ≡ 0 (mod p) such that the prime ideal corresponds to the pair (p, cp). In this case, we have a ≡ –cpb (mod p). Assume that another prime ideal of norm p appears in the prime factorization of 〈a + bα〉. If corresponds to the pair p, , then . Since cp and are distinct modulo p, it follows that p|gcd(a, b), a contradiction, since gcd(a, b) = 1. Thus, a unique ideal of norm p appears in the factorization of 〈a + bα〉. Moreover, the multiplicity of in the factorization of 〈a + bα〉 is the same as the multiplicity vp(N(a + bα)).

Thus, one may attempt to factorize N(a + bα) using trial divisions by primes ≤ B. If the factorization is successful, that is, if N(a + bα) is B-smooth, then for each prime divisor p of N(a + bα) we find out the ideal and its multiplicity in the factorization of 〈a + bα〉 as explained above. Since we know a generator of each , we eventually compute a factorization , where u is a unit in . What remains is to factor u as a product of elements of . We don’t discuss this step here, but refer the reader to Lenstra et al. [167].

Sieving

In the QSM, we check the smoothness of a single integer T(c) per trial, whereas for the NFS method we do so for two integers, namely, a + bm and N(a + bα). However, both these integers are much smaller than T(c), and the probability that they are simultaneously smooth is larger than the probability that T(c) alone is smooth. This accounts for the better asymptotic performance of the NFS method compared to the QSM.

One has to check the smoothness of a + bm and N(a + ) for each coprime a, b in a predetermined interval. This check can be carried out efficiently using sieves. We have to use two sieves, one for filtering out the non-smooth a + bm values and the other for filtering out the non-smooth a + bα values. We should have gcd(a, b) = 1, but computing gcd(a, b) for all values of a and b is rather costly. We may instead use a third sieve to throw away the values of a for a given b for which gcd(a, b) is divisible by primes ≤ B. This still leaves us with some pairs (a, b) for which gcd(a, b) > 1. But this is not a serious problem, since such values are small in number and can be later discarded from the list of pairs (a, b) selected by the smoothness test.

We fix b and allow a to vary in the interval –MaM for a predetermined bound M. We use an array indexed by a. Before the first sieve we initialize this array to . We may set for those values of a for which gcd(a, b) is known to be > 1 (where +∞ stands for a suitably large positive value). For each small prime pB and small exponent , we compute a′ := –mb (mod ph) and subtract ln p from for each a, –MaM, with aa′ (mod ph). Finally, for each value of a for which is not (close to) 0, that is, for which a + mb is not B-smooth, we set . For the other values of a, we set . One may use incomplete sieving (with a liberal selection criterion) during the first sieve.

The second sieve proceeds as follows. We continue to work with the value of b fixed before the first sieve and with the array available from the first sieve. For each prime ideal , we compute a″ := –bcp (mod p) and subtract ln p from each location for which aa″ (mod p). For those a for which ln B for some real ξ ≥ 1, say ξ = 2, we try to factorize a + bα over and . If the attempt is successful, both a + bm and a + bα are smooth. This second sieve is an incomplete one and, therefore, we must use a liberal selection criterion.

The running time of the SNFSM

For deriving the running time of the SNFSM, take d ≤ (3 ln n/(2 ln ln n))1/3, m = L(n, 2/3, (2/3)1/3), B = L(n, 1/3, (2/3)2/3) and M = L(n, 1/3, (2/3)2/3). From the prime number theorem and from the fact that d is small, it follows that both and have the same asymptotic bound as B. Also meets this bound. We then have L(n, 1/3, (2/3)2/3) unknown quantities on which we have to do Gaussian elimination.

The integers a + mb have absolute values ≤ L(n, 2/3, (2/3)1/3). If the coefficients of f are small, then

|N(a + bα)| = |bdf(–a/b)| ≤ L(n, 1/3, d · (2/3)2/3) = L(n, 2/3, (2/3)1/3).

Under the heuristic assumption that a + mb and N(a + bα) behave as random integers of magnitude L(n, 2/3, (2/3)1/3), the probability that both these are B-smooth turns out to be L(n, 1/3, –(2/3)2/3), and so trying L(n, 1/3, 2(2/3)2/3) pairs (a, b) is expected to give us L(n, 1/3, (2/3)2/3) relations. The entire sieving process takes time L(n, 1/3, 2(2/3)2/3), whereas solving a sparse system in L(n, 1/3, (2/3)2/3) unknowns can be done essentially in the same time. Thus the running time of the SNFSM is L(n, 1/3, 2(2/3)2/3) = L(n, 1/3, (32/9)1/3).

Exercise Set 4.3

4.6For , define the harmonic numbers . Show that for each we have ln(m + 1) ≤ Hm ≤ 1 + ln m. [H] Deduce that the sequence Hm, , is not convergent. (Note, however, that the sequence Hm – ln m, , converges to the constant γ = 0.57721566 . . . known as the Euler constant. It is not known whether γ is rational or not.)
4.7Let k, c, c′, α be positive constants with α < 1. Prove the following assertions.
  1. .

  2. L(n, α, c)L(n, α, c′) is of the form L(n, α, c + c′).

  3. (ln n)kL(n, α, c) is again of the form L(n, α, c).

  4. L(n, α, c)nk is of the form nk+o(1).

4.8Let us assume that an adversary C has computing power to carry out 1012 floating point operations (flops) per second. Let A be an algorithm that computes a certain function P(n) using T(n) flops for an input . We say that it is infeasible for C to compute P(n) using algorithm A, if it takes ≥ 100 years for the computation or, equivalently, if T(n) ≥ 3.1536 × 1021. Find, for the following expressions of T(n), the smallest values of n that make the computation of P(n) by Algorithm A infeasible: T(n) = (ln n)3, T(n) = (ln n)10, T(n) = n, , T(n) = n1/4, T(n) = L[2], T(n) = L[1], T(n) = L[0.5], T(n) = L(n, 1/3, 2) and T(n) = L(n, 1/3, 1). (Neglect the o(1) terms in the definitions of L( ) and L[ ].)
4.9Let be an odd integer and let r be the total number of distinct (odd) prime divisors of n. Show that for each integer a the congruence x2a2 (mod n) has ≤ 2r solutions for x modulo n. If gcd(a, n) = 1, show that this congruence has exactly 2r solutions. [H]
4.10Show that the problems IFP and SQRTP are probabilistic polynomial-time equivalent. [H]
4.11In this exercise, we use the notations introduced in connection with the Quadratic Sieve method for factoring integers (Section 4.3.2). We assume that MH, since , whereas M = L[1].
  1. Show that J ≤ 2H – 1.

  2. Prove that the average of the integers |T(c)|, –McM, is and that the maximum of the same integers is |T(M)| = J + 2MH + M2J + 2MH.

  3. Prove that the average and the maximum of the integers |T(c)|, 0 ≤ c ≤ 2M, are respectively J + 2MH + M(4M + 1)/3 ≈ J + 2MH and |T(2M)| = J + 4MH + 4M2J + 4MH.

  4. Conclude that it is better to choose the sieving interval as –McM instead of as 0 ≤ c ≤ 2M.

4.12

Reyneri’s cubic sieve method (CSM) Suppose that we want to factor an odd integer n. Suppose also that we know a triple (x, y, z) of integers satisfying x3y2z (mod n) with x3y2z (as integers). We assume further that |x|, |y|, |z| are all O(pξ) for some ξ, 1/3 < ξ < 1/2.

  1. Show that for integers a, b, c with a + b + c = 0 one has

    (x + ay)(x + by)(x + cy) ≡ y2T(a, b, c) (mod n),

    where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (zc2x). If x, y, z = O(pξ), then T(a, b, c) is O(pξ) for small values of a, b, c.

  2. Let . Choose a factor base comprising all primes q1, . . . , qt with t = L[α] together with the integers x + ay, –MaM, M = L[α]. The size of the factor base is then L[α].

    If T(a, b, c) with –Ma, b, cM and a + b + c = 0 is qt-smooth, we get a relation for the CSM. Show that trying out the L[2α] pairs (a, b, c) gives us a set of linear congruences of the desired size under the heuristic assumption that the T(a, b, c) values behave as random integers on the order of pξ.

  3. Propose a strategy how these linear congruences can be combined (by Gaussian elimination) to get a quadratic congruence of the form u2v2 (mod n).

  4. Design a sieve for checking the smoothness of the expressions T(a, b, c). [H]

  5. Show that the running time of the CSM is . Since ξ < 1/2, the CSM is more efficient than the QSM. For ξ ≈ 1/3, the running time is .

    (Remark: It is not known how we can efficiently obtain a solution of x3y2z (mod n) with x3y2z and |x|, |y|, |z| = O(pξ), ξ being as small as possible. For some particular values of n, say, for n of the form x3z with small z, a solution is naturally available.)

4.13Sieve of Eratosthenes Two hundred years before Christ, Eratosthenes proposed a sieve (Algorithm 4.5) for computing all primes between 1 and a positive integer n. Prove the correctness of this algorithm and compute its running time. [H]
Algorithm 4.5. The sieve of Eratosthenes

Initialize to zero an array A indexed 2, . . . , n.
for 
   if (Ak = 0) { for l = 2, . . . , ⌊n/k⌋ { Alk := 1. } }
}
for k = 2, . . . , n { if (Ak = 0) { Print “k is a prime”. } }

4.14This exercise proposes an adaptation of the sieve of Eratosthenes for computing a random prime of a given bit length l. In Section 3.4.2, we have described an algorithm for this computation, that generates random (odd) integers of bit length l and checks the primality of each such integer, until a (probable) prime is found. An alternative strategy is to generate a random l-bit odd integer n and check the integers n, n + 2, n + 4, . . . for primality.
  1. Use sieving to design an algorithm that generalizes this second strategy in the sense that it checks for primality only those integers n + r, r = 0, 1, 2, . . . , M, n a random l-bit integer, which are not divisible by the first t primes. In practice, the values 100 ≤ t ≤ 10,000 and M = 10l work quite well. For cryptographic sizes, sieving typically speeds up the generation of naive primes 10 to 100 times.

  2. Generalize the sieve of Part (a) for the computation of safe and strong primes.

4.4. The Finite Field Discrete Logarithm Problem

The discrete logarithm problem (DLP) has attracted somewhat less attention of the research community than the IFP has done. Nonetheless, many algorithms exist to solve the DLP, most of which are direct adaptations of algorithms for solving the IFP. We start with the older algorithms collectively known as the square root methods, since the worst-case running time of each of these is for the field . The new family of algorithms based on the index calculus method provides subexponential solutions to the DLP and is described next. For the sake of simplicity, we assume in this section that we want to compute the discrete logarithm indg a of with respect to a primitive element g of . We concentrate only on the fields , p odd prime, and , , since non-prime fields of odd characteristics are only rarely used in cryptography.

4.4.1. Square Root Methods

Square root methods are applicable to any finite (cyclic) group. To avoid repetitions we provide here a generic description. That is, we assume that G is a multiplicatively written group of order n and . The identity of G is denoted as 1. It is not necessary to assume that G is cyclic or that g is a generator of G. However, these assumptions are expected to make the descriptions of the algorithms somewhat easier and hence we will stick to them. The necessary modifications for non-cyclic groups G or non-primitive elements g are rather easy and the reader is requested to fill out the details. We assume that each element of G can be represented by O(lg n) bits (so that the input size is taken to be lg n) and that multiplications, exponentiations and inverses in G can be computed in time polynomially bounded by this input size.

Shanks’ baby-step–giant-step method

Let us assume that the elements of G can be (totally) ordered in such a way that comparing two elements of G with respect to this order can be done in polynomial time of the input size. For example, a natural order on is the relation ≤ on . Note that k elements of G can be sorted (under the above order) using O(k log k) comparisons.

Let . Then d := indg a is uniquely determined by two (nonnegative) integers d0, d1 < m such that d = d0 + d1m (the base m representation of d). In Shanks’ baby-step–giant-step (BSGS) method, we compute d0 and d1 as follows. To start with we compute a list of pairs (d0, gd0) for d0 = 0, 1, . . . , m – 1 and store these pairs in a table sorted with respect to the second coordinate (the baby steps). Now, for each d1 = 0, 1, . . . , m – 1, we compute gmd1 (the giant steps) and search if agmd1 is the second coordinate of a pair (d0, gd0) of some entry in the table mentioned above. If so, we have found the desired d0 and d1, otherwise we try with the next value of d1. An optimized implementation of this strategy is given as Algorithm 4.6.

The computation of all the elements of T and sorting T can be done in time O~(m). If we use a binary search algorithm (Exercise 4.15), then the search for h in T can be performed using O(lg m) comparisons in G. Therefore, the giant steps also take a total running time of O~(m). Since , the BSGS method runs in time . The memory requirement of the BSGS (that is, of the table T) is . Thus this method becomes impractical, even when n contains as few as 30 decimal digits.

Pollard’s rho method

Pollard’s rho method for solving the DLP is similar in idea to the method with the same name for solving the IFP. Let be a random map and let us generate a sequence of tuples , , starting with a random (r1, s1) and subsequently computing (ri+1, si+1) = f(ri, si) for each i = 1, 2, . . . . The elements for i = 1, 2, . . . can then be thought of as randomly chosen ones from G. By the birthday paradox (Exercise 2.172), we expect to get a match bi = bj for some ij, after of the elements b1, b2, . . . are generated. But then we have arirj = gsjsi, that is, indg a ≡ (rirj)–1(sjsi) (mod n), provided that the inverse exists, that is, gcd(rirj, n) = 1. The expected running time of this algorithm is , the same as that of the BSGS method, but the storage requirement drops to only O(1) elements of G.

Algorithm 4.6. Shanks’ baby-step–giant-step method

Input: G, g and a as described above.

Output: d = indg a.

Steps:

n : = ord(G), .

/* Baby steps */

Initialize T to an empty table.

Insert the pairs (0, 1) and (1, g) in T.

h := g.
for d0 = 2, . . . , m – 1 {
    h := hg.
    Insert (d0hin T.
}
sort T with respect to the second coordinate.

/* Giant steps */
h := al := (g–1)m.
for d1 = 0, . . . , m – 1 {
    if T contains an entry (d0h) { Return d := d0 + d1m. }
    h := hl.
}

The Pohlig–Hellman method

The Pohlig–Hellman (PH) method assumes that the prime factorization of n = ord is known. Since d := indg a is unique modulo n, we can easily compute d using the CRT from a knowledge of d modulo , j = 1, . . . , r. So assume that p is a prime dividing n and that . Let d0 + d1p + ··· + dα–1pα–1, , be the p-ary representation of d modulo pα. The p-ary digits d0, d1, . . . , dα–1 can be successively computed as follows.

Let H be the subgroup of G generated by h := gn/p. We have ord H = p (Exercise 2.44). For the computation of di, , from the knowledge of d0, . . . , di–1, consider the element

But ord(gn/pi+1) = pi+1, so that

Thus, and di = indh b, that is, each di can be obtained by computing a discrete logarithm in the group H of order p (using the BSGS method or the rho method).

From the prime factorization of n, we see that the computations of d modulo for all j = 1, . . . , r can be done in time , q being the largest prime factor of n, since αj and r are O(log n). Combining the values of d modulo by the CRT can be done in polynomial time (in log n). In the worst case, q = O(n) and the PH method takes time which is fully exponential in the input size log n. But if q (or, equivalently, all the prime divisors p1, . . . , pr of n) are small, then the PH method runs quite efficiently. In particular, if q = O((log p)c) for some (small) constant c, then the PH method computes discrete logarithms in G in polynomial time. This fact has an important bearing on the selection of a group G for cryptographic applications, namely, n = ord G is required to have a suitably large prime divisor, so that the PH method cannot compute discrete logarithms in G in feasible time.

4.4.2. The Index Calculus Method

The index calculus method (ICM) is not applicable to all (cyclic) groups. But whenever it applies, it usually leads to the fastest algorithms to solve the DLP. Several variants of the ICM are used for prime finite fields and also for finite fields of characteristic 2. On such a field they achieve subexponential running times of the order of L(q, 1/2, c) = L[c] or L(q, 1/3, c) for positive constants c. We start with a generic description of the ICM. We assume that g is a primitive element of and want to compute d := indg a for some .

To start with we fix a suitable subset B = {b1, . . . , bt} of of small cardinality, so that a reasonably large fraction of the elements of can be expressed easily as products of elements of B. We call B a factor base. In the ICM, we search for relations of the form

Equation 4.5


for integers α, β, γi and δi. This gives us a linear congruence

Equation 4.6


The ICM proceeds in two[4] stages. In the first stage, we compute di := indg bi for each element bi in the factor base B. For that, we collect Relation (4.5) with β = 0. When sufficiently many relations are available, the corresponding system of linear Congruences (4.6) is solved mod q – 1 for the unknowns di. In the second stage, a single relation with gcd(β, q – 1) = 1 is found. Substituting the values of di available from the first stage yields indg a.

[4] Some authors prefer to say that the number of stages in the ICM in actually three, because they decouple the congruence-solving phase from the first stage. This is indeed justified, since implementations by several researchers reveal that for large fields this linear algebra part often demands running time comparable to that needed by the relation collection part. Our philosophy is to call the entire precomputation work the first stage. Now, although it hardly matters, it is up to the reader which camp she wants to join.

Note that as long as q (and g) are fixed, we don’t have to carry out the first stage every time the discrete logarithm of an element of is to be computed. If the values di, i = 1, . . . , t, are stored, then only the second stage needs to be carried out for computing the indices of any number of elements of . This is the reason why the first stage of the ICM is often called the precomputation stage.

In order to make the algorithm more concrete, we have to specify:

  1. how to choose a factor base B;

  2. how to find Relation (4.5);

  3. how to solve a linear system of congruences modulo q – 1 (in particular, when the system is sparse).

In the rest of this section, we describe variants of the ICM based on their strategies for selecting the factor base and for collecting relations. We discuss the third issue in Section 4.7.

4.4.3. Algorithms for Prime Fields

Let be a finite field of prime cardinality. For cryptographic applications, p should be quite large, say, of length around thousand bits or more, and so naturally p is odd. Elements of are canonically represented as integers between (and including) 0 and p–1. The equality x = y in means equality of two integers in the range 0, . . . , p–1, whereas xy (mod p) means that the two integers x and y may be different, but their equivalence classes in are the same.

The basic ICM

In the basic version of the ICM, we choose the factor base B to comprise the first t primes q1, . . . , qt, where t = L[ζ]. (The optimal value of ζ is determined below.) In the first stage, we choose random values of and compute gα. Any integer representing gα can be considered, but we think of as an integer in {1, . . . , p – 1}. We then try to factorize gα using trial divisions by elements of the factor base B. If gα is found to be B-smooth, then we get a desired relation for the first stage, namely,

If gα is not B-smooth, we try another random α and proceed as above. After sufficiently many relations are available, we solve the resulting system of linear congruences modulo p – 1. This gives us di := indg qi for i = 1, . . . , t.

In the second stage, we again choose random integers α and try to factorize agα completely over B. Once the factorization gets successful, that is, we have , we compute .

In order to optimize the running time, we note that the relation collection phase of the first stage is usually the bottleneck of the algorithm. If ζ (or equivalently t) is chosen to be too small, then finding B-smooth integers would be very difficult. On the other hand, if ζ is too large, then we have to collect too many relations to have a solvable linear system of congruences. More explicitly, since the integers gα can be regarded as random integers of the order of p, the probability that gα is B-smooth is (Corollary 4.1). Thus we expect to get each relation after random values of α are tried. Since for each α we need to carry out L[ζ] divisions by elements of the factor base B (the exponentiation gα can be done in polynomial time and hence can be neglected for this analysis), each relation can be found in expected time . Now, in order to solve for di, i = 1, . . . , t, we must have (slightly more than) t = L[ζ] relations. Thus, the relation collection phase takes a total time of . It can be easily checked that is minimized for ζ = 1/2. This gives a running time of L[2] for the relation collection phase.

Since each gα is a positive integer less than p, it is evident that it can have at most O(log p) prime divisors. In other words, the congruences collected are necessarily sparse. As we will see later, such a system can be solved in time O~(t2), that is, in time L[1] for ζ = 1/2.

In the second stage, it is sufficient to have a single relation to compute d = indg a. As explained before, such a relation can be found in expected time . Thus the total running time of the basic ICM is L[2].

The second stage of the basic ICM is much faster than the first stage. In fact, this is a typical phenomenon associated with most variants of the ICM. Speeding up the first stage is, therefore, our primary concern.

Each step in the search for relations consists of an exponentiation (gα) modulo p followed by trial divisions by q1, . . . , qt. Now, gα may be non-smooth, but gα + kp (integer sum) may be smooth for some . Once gα is computed and found to be non-smooth, one can check for the smoothness of gα + kp for k = ±1, ±2, . . ., before another α is tried. Since these integers are available by addition (or subtraction) only (which is much faster than exponentiation), this strategy tends to speed up the relation collection phase. Moreover, information about the divisibility of gα + kp by qi can be obtained from that of gα + (k – 1)p by qi. So using suitable tricks one might reduce the cost of trial divisions. Two such possibilities are explored in Exercise 4.18. Though these modifications lead to some speed-up in practice, they have the disadvantage that as |k| increases, the size of |gα+kp| also increases, so that the chance of getting smooth candidates reduces, and therefore using high values of k does not effectively help.

There are other heuristic modification schemes that help us gain some speed-up in practice. For example, the large prime variation as discussed in connection with the QSM applies equally well here. Another trick is to use the early abort strategy. A random B-smooth integer has higher probability of having many small prime factors rather than a few large prime factors. This observation can be incorporated in the smoothness tests as follows. Let us assume that we do trial divisions by the small primes in the order q1, q2, . . . , qt. After we do trial divisions of a candidate x by the first t′ < t primes (say, t′ ≈ t/2), we check how far we have been able to reduce x. If the reduction of x is already substantial, we continue with the trial divisions by the remaining primes qt′+1, . . . , qt. In the other case, we abort the smoothness test for x and try another candidate. Obviously, this strategy prematurely rejects some smooth candidates (which are anyway rather small in number), but since most candidates are expected to be non-smooth, it saves a lot of trial divisions in the long run. Determination of t′ and/or the quantization of “substantial” reduction actually depend on practical experiences. With suitable choices one may expect to get a speed-up of about 2. The drawback with the early abort strategy is that it often does not go well with sieving. Sieving, whenever applicable, should be given higher preference.

To sum up, the basic ICM and all its modifications can be used for computing discrete logarithms only in small fields, say, of size ≤ 80 bits. For bigger fields, we need newer ideas.

The linear sieve method

The linear sieve method (LSM) is a direct adaptation of the quadratic sieve method for factoring integers (Section 4.3.2). In the basic ICM just discussed, we try to find smooth integers from candidates that are on an average as large as O(p). The LSM, on the other hand, finds smooth ones from a pool of integers each of which is . As a result, we expect to have a higher density of smooth integers among the candidates tested in the LSM than those in the basic method. Furthermore, the LSM employs sieving techniques instead of trial divisions. All these help LSM achieve a running time of L[1], a definite improvement over the L[2] performance of the basic method.

Let and J := H2p. Then . Let’s consider the congruence

Equation 4.7


For small integers c1, c2, the right side of the above congruence, henceforth denoted as

T(c1, c2) := J + (c1 + c2)H + c1c2,

is of the order of . If the integer T(c1, c2) is smooth with respect to the first t primes q1, q2, . . . , qt, that is, if we have a factorization like , then we have a relation

For the linear sieve method, the factor base comprises primes less than L[1/2] (so that t = L[1/2] by the prime number theorem) and integers H + c for –McM. The bound M on c is chosen to be of the order of L[1/2]. Each T(c1, c2), being in absolute value, has a probability of L[–1/2] for being qt-smooth. Thus once we check the factorization of T(c1, c2) for all (that is, for a total of L[1]) values of the pair (c1, c2) with –Mc1c2M, we expect to get L[1/2] Relations (4.7) involving the unknown indices of the factor base elements. If we further assume that the primitive element g is a small prime which itself is in the factor base, then we get a free relation indg g = 1. The resulting system is then solved to compute the discrete logarithms of elements in the factor base. This is the basic principle for the first stage of the LSM.

If we compute all T(c1, c2) and use trial divisions by q1, . . . , qt to separate out the smooth ones, we achieve a running time of L[1.5], as can be easily seen. Sieving is employed to reduce the running time to L[1]. First one fixes a and initializes to ln |T(c1, c2)| an array indexed by c2 in the range c1c2M. One then computes for each prime power qh, q being a small prime in the factor base and h a small positive exponent, a solution for c2 of the congruence (H + c1)c2 + (J + c1H) ≡ 0 (mod qh).

If gcd(H + c1, q) = 1, that is, if H + c1 is not a multiple of q, then the solution is given by σ ≡ –(J + c1H)(H + c1)–1 (mod qh). The inverse in the last congruence can be calculated by running the extended gcd algorithm (Algorithm 3.8) on H + c1 and qh. Then for each value of c2 (in the range c1c2M) that is congruent to σ (mod qh), ln q is subtracted from the array location .

If q|(H + c1), we find out h1 := vq(H + c1) > 0 and h2 := vq(J + c1H) ≥ 0. If h1 > h2, then for each value of c2, the expression T(c1, c2) is divisible by qh2 and by no higher powers of q. So we subtract the quantity h2 ln q from for all c2. Finally, if h1h2, then we subtract h1 ln q from for all c2 and for h > h1 solve the congruence as .

Once the above procedure is carried out for each small prime q in the factor base and for each small exponent h, we check for which values of c2, the value is equal (that is, sufficiently close) to 0. These are precisely the values of c2 such that for the given c1 the integer T(c1, c2) factors smoothly over the small primes in the factor base.

As in the QSM for integer factorization, it is sufficient to have some approximate representations of the logarithms (like ln q). Incomplete sieving and large prime variation can also be adopted as in the QSM.

Finally, we change c1 and repeat the sieving process described above. It is easy to see that the sieving operations for all c1 in the range –Mc1M take time L[1] as announced earlier. Gaussian elimination involving sparse congruences in L[1/2] variables also meets the same running time bound.

The second stage of the LSM can be performed in L[1/2] time. Using a method similar to the second stage of the basic ICM leads to a huge running time (L[3/2]), because we have only L[1/2] small primes in the factor base. We instead do the following. We start with a random j and try to obtain a factorization of the form , where q runs over L[1/2] small primes in the factor base and u runs over medium-sized primes, that is, primes less than L[2]. One can use an integer factorization algorithm to this effect. Lenstra’s ECM is, in particular, recommended, since it can detect smooth integers fast. More specifically, about L[1/4] random values of j need to be tried, before we expect to get an integer with the desired factorization. Each attempt of factorization using the ECM takes time less than L[1/4].

Now, we have . The indices indg q are available from the first stage, whereas for each u (with wu ≠ 0) the index indg u is calculated as follows. First we sieve in an interval of size L[1/2] around and collect integers y in this interval, which are smooth with respect to the L[1/2] primes in the factor base. A second sieve in an interval of size L[1/2] around H gives us a small integer c, such that (H + c)yup is smooth again with respect to the L[1/2] primes in the factor base. Since H + c is in the factor base, we get indg u. The reader can easily verify that computing individual logarithms indg a using this method takes time L[1/2] as claimed earlier.

There are some other L[1] methods (like the Gaussian integer method and the residue list sieve method) known for computing discrete logarithms in prime fields. We will not discuss these methods in this book. Interested readers may refer to Coppersmith et al. [59] to know about these L[1] methods. A faster method (running time L[0.816]), namely the cubic sieve method, is covered in Exercise 4.21. Now, we turn our attention to the best method known till date.

** The number field sieve method

The number field sieve method (NFSM) for solving the DLP in a prime field is a direct adaptation of the NFSM used to factor integers (Section 4.3.4). As before, we let g be a generator of and are interested in computing the index d = indg a for some .

We choose an irreducible polynomial with small integer coefficients and of degree d, and use the number field for some root of f. For the sake of simplicity, we consider the special case (SNFSM) that f is monic, is a PID, and . We also choose an integer m such that f(m) ≡ 0 (mod p) and define the ring homomorphism

Finally, we predetermine a bound and let be the set of (rational) primes , the set of prime ideals of of prime norms , a set of generators of the (principal) ideals and a set of generators of the group of units of .

We try to find coprime integers c, d of small absolute values such that both c + dα and Φ(c + dα) = c + dm are smooth with respect to and respectively, that is, we have factorizations of the forms and or equivalently, . But then , that is,

Equation 4.8


This motivates us to define the factor base as

We assume that so that we have the free relation indg g ≡ 1 (mod p – 1).

Trying sufficiently many pairs (c, d) we generate many Relations (4.8). The resulting sparse linear system is solved for the unknown indices of the elements of B. This completes the first stage of the SNFSM.

In the second stage, we bring a to the scene in the following manner. First assume that a is small such that either a is -smooth, that is,

or for some the ideal can be written as a product of prime ideals of , that is,

or, equivalently,

In both the cases, taking logarithms and substituting the indices of the elements of the factor base (available from the first stage) yields d = indg a.

However, a is not small, in general, and it is a non-trivial task to find a such that 〈γ〉 is -smooth. We instead write a as a product

Equation 4.9


where each ai is small enough so that indg ai can be computed using the method described above. This gives . In order to see how one can find a representation of a as a product of small integers as in Congruence (4.9), we refer the reader to Weber [300].

As in most variants of the ICM, the running time of the SNFSM is dominated by the first stage and under certain heuristic assumptions can be shown to be of the order of L(p, 1/3, (32/9)1/3). Look at Section 4.3.4 to see how the different parameters can be set in order to achieve this running time. For the general NFS method (GNFSM), the running time is L(p, 1/3, (64/9)1/3). The GNFSM has been implemented by Weber and Denny [301] for computing discrete logarithms modulo a particular prime having 129 decimal digits (see McCurley [189]).

4.4.4. Algorithms for Fields of Characteristic 2

We wish to compute the discrete logarithm indg a of an element , q = 2n, with respect to a primitive element g of . We work with the representation for some irreducible polynomial with deg f = n. For certain algorithms, we require f to be of special forms. This does not create enough difficulties, since it is easy to compute isomorphisms between two polynomial basis representations of (Exercise 3.38).

Recall that we have defined the smoothness of an integer x in terms of the magnitudes of the prime divisors of x. Now, we deal with polynomials (over ) and extend the definition of smoothness in the obvious way: that is, a polynomial is called smooth if it factors into irreducible polynomials of low degrees. The next theorem is an analog of Theorem 2.21 for polynomials. By an abuse of notation, we use ψ(·, ·) here also. The context should make it clear what we are talking about – smoothness of integers or of polynomials.

Theorem 4.1.

Let r, , r1/100mr99/100, and let u := r/m. Then the number of polynomials , deg f = r, such that all irreducible factors of f have degrees ≤ m, equals 2ruu+o(u) = 2re–[(1+o(1))u ln u] as u → ∞. In particular, the probability that the degrees of all irreducible factors of a randomly chosen polynomial in of degree r are ≤ m is asymptotically equal to

ψ(r, m) := uu+o(u) = e–[(1+o(1))u ln u].

The above expression for ψ(r, m), though valid asymptotically, gives good approximations for finite values of r and m. The condition r1/100mr99/100 is met in most practical situations. The probability ψ(r, m) is a very sensitive function of u = r/m. For a fixed m, polynomials of smaller degrees have higher chances of being smooth (that is, of having all irreducible factors of degrees ≤ m).

Now, let us consider the field with q = 2n. The elements of are represented as polynomials of degrees ≤ n–1. For a given m, the probability that a randomly chosen element of has all irreducible factors of degrees ≤ m is then approximately given by , as n, m → ∞ with n1/100mn99/100. We can, therefore, approximate by ψ(n, m).

For many algorithms that we will come across shortly, we have rn/α and for some positive α and β, so that and, consequently,

The basic ICM

The idea of the basic ICM for is analogous to that for prime fields. Now, the factor base B comprises all irreducible polynomials of having degrees ≤ m. We choose . (As in the case of the basic ICM for prime fields, this can be shown to be the optimal choice.) By Approximation (2.5) on p 84, we then have .

In the first stage, we choose random α, 1 ≤ α ≤ q – 2, compute gα and check if gα is B-smooth. If so, we get a relation. For a random α, the polynomial gα is a random polynomial of degree < n and hence has a probability of nearly of being smooth. Note that unlike integers a polynomial over can be factored in probabilistic polynomial time (though for small m it may be preferable to do trial division by elements of B). Thus checking the smoothness of a random element of can be done in (probabilistic) polynomial time, and each relation is available in expected time . Since we need (slightly more than) relations for setting up the linear system, the relation collection stage runs in expected time . A sparse system with unknowns can also be solved in time .

In the second stage, we need a single smooth polynomial of the form gαa. If α is randomly chosen, we expect to get this relation in time . Therefore, the second stage is again faster than the first and the basic method takes a total expected running time of . Recall that the basic method for requires time L[2]. The difference arises because polynomial factorization is much easier than integer factorization.

We now explain a modification of the basic method, proposed by Blake et al. [23]. Let : that is, a non-zero polynomial in of degree < n. If h is randomly chosen from (as in the case of gα or gαa for random α), then we expect the degree of h to be close to n. Let us write hh1/h2 (mod f) (f being the defining polynomial) with h1 and h2 each having degree ≈ n/2. Then the ratio of the probability that both h1 and h2 are smooth to the probability that h is smooth is ψ(n/2, m)2/ψ(n, m) ≈ 2n/m (neglecting the o( ) terms). For practical values of n and m, this ratio of probabilities can be substantially large implying that it is easier to get relations by trying to factor both h1 and h2 instead of trying to factor h. This is the key observation behind the modification due to Blake et al. [23]. Simple calculations show that this modification does not affect the asymptotic behaviour of the basic method, but it leads to considerable speed-up in practice.

In order to complete the description of the modification of Blake et al. [23], we mention an efficient way to write h as h1/h2 (mod f). Since 0 ≤ deg h < n and since f is irreducible of degree n, we must have gcd(h, f) = 1. During the iteration of the extended gcd algorithm we actually compute a sequence of polynomials uk, vk, xk such that ukh + vkf = xk for all k = 0, 1, 2, . . . . At the start of the algorithm we have u0 = 1, v0 = 0 and x0 = h. As the algorithm proceeds, the sequence deg uk changes non-decreasingly, whereas the sequence deg xk changes non-increasingly and at the end of the extended gcd algorithm we have xk = 1 and the desired Bézout relation ukh + vkf = 1 with deg ukn – 1. Instead of proceeding till the end of the gcd loop, we stop at the value k = k′ for which deg xk is closest to n/2. We will then usually have deg ukn/2, so that taking h1 = xk and h2 = uk serves our purpose.

The concept of large prime variation is applicable for the basic ICM. Moreover, if trial divisions are used for smoothness tests, one can employ the early abort strategy. Despite all these modifications the basic variant continues to be rather slow. Our hunt for faster algorithms continues.

The adaptation of the linear sieve method

The LSM for prime fields can be readily adapted to the fields , q = 2n. Let us assume that the defining polynomial f is of the special form f(X) = Xn + f1(X), where deg f1 is small. The total number of choices for such f with deg f1 < k is 2k. Under the assumption that irreducible polynomials (over ) of degree n are randomly distributed among the set of polynomials of degree n, we expect to find an irreducible polynomial f = Xn + f1 for deg f1 = O(lg n) (see Approximation (2.5) on p 84). In particular, we may assume that deg f1n/2.

Let k := ⌈n/2⌉ and . For polynomials h1, of small degrees, we then have

(Xk + h1)(Xk + h2) ≡ Xσf1 + (h1 + h2)Xk + h1h2 (mod f).

The right side of the congruence, namely,

T(h1, h2) := Xσf1 + (h1 + h2)Xk + h1h2,

has degree slightly larger than n/2. This motivates the following algorithm.

We take and let the factor base B be the (disjoint) union of B1 and B2, where B1 contains irreducible polynomials of degrees ≤ m, and where B2 contains polynomials of the form Xk + h, deg hm. Both B1 and B2 (and hence B) contain L[1/2] elements. For each Xk + h1, , we then check the smoothness of T(h1, h2) over B1. Since deg T(h1, h2) ≈ n/2, the probability of finding a smooth candidate per trial is L[–1/2]. Therefore, trying L[1] values of the pair (h1, h2) is expected to give L[1/2] relations (in L[1/2] variables). Since factoring each T(h1, h2) can be performed in probabilistic polynomial time, the relation collection stage takes time L[1]. Gaussian elimination (with sparse congruences) can be done in the same time. As in the case of the LSM for prime fields, the second stage can be carried out in time L[1/2]. To sum up, the LSM for fields of characteristic 2 takes L[1] running time.

Note that the running time L[1] is achievable in this case without employing any sieving techniques. This is again because checking the smoothness of each T(h1, h2) can simply be performed in polynomial time. Application of polynomial sieving, though unable to improve upon the L[1] running time, often speeds up the method in practice. We will describe such a sieving procedure in connection with Coppersmith’s algorithm that we describe next.

Coppersmith’s algorithm

Coppersmith’s algorithm is the fastest algorithm known to compute discrete logarithms in finite fields of characteristic 2. Theoretically it achieves the (heuristic) running time L(q, 1/3, c) and is, therefore, subexponentially faster than the L[c′] = L(q, 1/2, c′) algorithms described so far. Gordon and McCurley have made aggressive attempts to compute discrete logarithms in fields as large as using Coppersmith’s algorithm in tandem with a polynomial sieving procedure and, thereby, established the practicality of the algorithm.

In the basic method, each trial during the search for relations involves checking the smoothness of a polynomial of degree nearly n. The modification due to Blake et al. [23] replaces this by checking the smoothness of two polynomials of degree ≈ n/2. For the adaptation of the LSM, on the other hand, we check the smoothness of a single polynomial of degree ≈ n/2. In Coppersmith’s algorithm, each trial consists of checking the smoothness of two polynomials of degrees ≈ n2/3. This is the basic reason behind the improved performance of Coppersmith’s algorithm.

To start with we make the assumption that the defining polynomial f of is of the form f(X) = Xn + f1(X) with deg f1 = O(lg n). We have argued earlier that an irreducible polynomial f of this special form is expected to be available. We now choose three integers m, M, k such that

m ≈ αn1/3(ln n)2/3, M ≈ βn1/3(ln n)2/3 and 2k ≈ γn1/3(ln n)–1/3,

where the (positive real) constants α, β and γ are to be chosen appropriately to optimize the running time. The factor base B comprises irreducible polynomials (over ) of degrees ≤ m. Let

l := ⌊n/2k⌋ + 1,

so that l ≈ (1/γ)n2/3(ln n)1/3. Choose relatively prime polynomials u1(X) and u2(X) (in ) of degrees ≤ M and let

h1(X) := u1(X)Xl + u2(X) and h2(X) := (h1(X))2k rem f(X).

But then, since indg h2 ≡ 2k indg h1 (mod q – 1), we get a relation if both h1 and h2 are smooth over B. By choice, deg h1 is clearly O~(n2/3), whereas

h2(X)≡ u1(X2k)Xl2k + u2(X2k)≡ u1(X2k)Xl2knf1(X) + u2(X2k)(mod f)

and, therefore, deg h2 = O~(n2/3) too.

For each pair (u1, u2) of relatively prime polynomials of degrees ≤ M, we compute h1 and h2 as above and collect all the relations corresponding to the smooth values of both h1 and h2. This gives us the desired (sparse) system of linear congruences in the unknown indices of the elements of B, which is subsequently solved modulo q – 1.

The choice and γ = α–1/2 gives the optimal running time of the first stage as

e[(2α ln 2)+o(1))n1/3(ln n)2/3] = L(q, 1/3, 2α/(ln 2)1/3) ≈ L(q, 1/3, 1.526).

The second stage of Coppersmith’s algorithm is somewhat involved. The factor base now contains only nearly L(q, 1/3, 0.763) elements. Therefore, finding a relation using a method similar to the second stage of the basic method requires time L(q, 2/3, c) for some c, which is much worse than even L[c′] = L(q, 1/2, c′). To work around this difficulty we start by finding a polynomial gαa all of whose irreducible factors have degrees ≤ n2/3(ln n)1/3. This takes time of the order of L(q, 1/3, c1) (where c1 ≈ 0.377) and gives us , where vi have degrees ≤ n2/3(ln n)1/3. Note that the number of vi is less than n, since deg(gαa) < n. We then have

All these vi need not belong to the factor base, so we cannot simply substitute the values of indg vi. We instead reduce the problem of computing each indg vi to the problem of computing indg vii for several i′ with deg vii ≤ σ deg vi for some constant 0 < σ < 1. Subsequently, computing each indg vii is reduced to computing indg viii for several i″ with deg viii < σ deg vii. Repeating this process, we eventually end up with the polynomials in the factor base. Because reduction of a polynomial generates new polynomials with degrees reduced by at least the constant factor σ, it is clear that the recursion depth is O(ln n). Now, if for each i the number of i′ is ≤ n and for each i′ the number of i″ is ≤ n and so on, we have to carry out the reduction of ≤ nO(ln n) = eO((ln n)2) = L(q, 1/3, 0) polynomials. Therefore, if each reduction can be performed in time L(q, 1/3, c2), the second stage will run in time L(q, 1/3, max(c1, c2)).

In order to explain how a polynomial v of degree ≤ dn2/3(ln n)1/3 can be reduced in the desired time, we choose such that , and let l := ⌊n/2k⌋ + 1. As in the first stage, we fix a suitable bound M, choose relatively prime polynomials u1(X), u2(X) of degrees ≤ M and define

h1(X) := u1(X)Xl + u2(X)

and

h2(X) := (h1(X))2k rem f(X) = u1(X2k)Xl2knf1(X) + u2(X2k).

The polynomials u1 and u2 should be so chosen that v|h1. We see that h1 and h2 have low degrees and we try to factor h1/v and h2. Once we get a factorization of the form

with deg vi, deg wj < σ deg v, we have the desired reduction of v, namely,

that is, the reduction of computation of indg v to that of all indg vi and indg wj. With the choice M ≈ (n1/3(ln n)2/3(ln 2)–1 + deg v)/2 and σ = 0.9, reduction of each polynomial can be shown to run in time L(q, 1/3, (ln 2)–1/3) ≈ L(q, 1/3, 1.130). Thus the second stage of Coppersmith’s algorithm runs in time L(q, 1/3, 1.130) and is faster than the first stage.

Large prime variation is a useful strategy to speed up Coppersmith’s algorithm. In case of trial divisions for smoothness tests, early abort strategy can also be applied. However, a more efficient idea (though seemingly non-collaborative with the early abort strategy) is to use polynomial sieving as introduced by Gordon and McCurley.

Recall that in the first stage we take relatively prime polynomials u1 and u2 of degrees ≤ M and check the smoothness of both h1(X) = u1(X)Xl + u2(X) and h2(X) = h1(X)2k rem f(X). We now explain the (incomplete) sieving technique for filtering out the (non-)smooth values of h1 = (h1)u1,u2 for the different values of u1 and u2. To start with we fix u1 and let u2 vary. We need an array indexed by u2, a polynomial of degree ≤ M. Clearly, u2 can assume 2M+1 values and so must contain 2M+1 elements. To be very concrete we will denote by the location , where u2(2) ≥ 0 is the integer obtained canonically by substituting 2 for X in u2(X) considered to be a polynomial in with coefficients 0 and 1. We initialize all the locations of to zero.

Let t = t(X) be a small irreducible polynomial in the factor base B (or a small power of such an irreducible polynomial) with δ := deg t. The values of u2 for which t divides (h1)u1,u2 satisfy the polynomial congruence u2(X) ≡ u1(X)Xl (mod t). Let be the solution of this congruence with . If δ* > M, then no value of u2 corresponds to smooth (h1)u1,u2. So assume that δ*M. If δ > M, then the only value of u2 for which (h1)u1,u2 is smooth is . So we may also assume that δ ≤ M. Then the values of u2 that makes (h1)u1,u2 smooth are given by for all polynomials v(X) of degrees ≤ M – δ. For each of these 2M–δ+1 values of u2, we add δ = deg t to the location .

When the process mentioned in the last paragraph is completed for all , we find out for which values of u2 the array locations contain values close to deg(h1)u1,u2. These values of u2 correspond to the smooth values of (h1)u1,u2 for the chosen u1. Finally, we vary u1 and repeat the sieving procedure again.

In each sieving process described above, we have to find out all the values as v runs through all polynomials of degrees ≤ M – δ. We may choose the different possibilities for v in any sequence, compute the products vt and then add these products to . While doing so serves our purpose, it is not very efficient, because computing each u2 involves performing a polynomial multiplication vt. Gordon and McCurley’s trick steps through all the possibilities of v in a clever sequence that helps one get each value of u2 from the previous one by a much reduced effort (compared to polynomial multiplication). The 2M–δ+1 choices of v can be naturally mapped to the bit strings of length (exactly) M – δ + 1 (with the coefficients of lower powers of X appearing later in the sequence). This motivates using the following concept.

Definition 4.2.

Let . Then the (binary) gray code of dimension d is a sequence of all (that is, 2d) bit strings of length d defined inductively as follows. For d = 1, we define and , whereas for d > 1 we define

where juxtaposition denotes string concatenation.

For example, the gray code of dimension 2 is 00, 01, 11, 10 and that of dimension 3 is 000, 001, 011, 010, 110, 111, 101, 100. Proposition 4.1 can be easily proved by induction on the dimension d.

Proposition 4.1.

Let and let be the gray code of dimension d. For any i, 1 ≤ i < 2d, the bit strings and differ in exactly one bit position b(i). This position is given by b(i) = v2(i), where v2(i) denotes the multiplicity of 2 in i.

Back to our sieving business! Let us agree to step through the values of v in the sequence v1, v2, . . . , v2M – δ+1, where vi corresponds to the bit string for the (M – δ + 1)-dimensional gray code. Let us also call the corresponding values of u2 as . Now, v1 is 0 and the corresponding is available at the beginning. By Proposition 4.1 we have for 1 ≤ i < 2M–δ+1 the equality vi+1 = vi + Xv2(i), so that (u2)i+1 = (u2)i + Xv2(i)t. Computing the product Xv2(i)t involves shifting the coefficients of t and is done efficiently using bit operations only (assuming data structures introduced in Section 3.5). Thus (u2)i+1 is obtained from (u2)i by a shift followed by a polynomial addition. This is much faster than computing (u2)i+1 directly as .

We mentioned earlier that efficient implementations of Coppersmith’s algorithm allows one to compute, in feasible time, discrete logarithms in fields as large as . However, for much larger fields, say for n ≥ 1024, this algorithm is still not a practical breakthrough. The intractability of the DLP continues to remain cryptographically exploitable.

Exercise Set 4.4

4.15

Binary search Let ≤ be a total order on a set S (finite or infinite) and let a1a2 ≤ ··· ≤ am be a given sequence of elements of S. Device an algorithm that, given an arbitrary element , determines using only O(lg m) comparisons in S whether a = ai for some i = 1, . . . , m and, if so, returns i. [H]

4.16
  1. Show that any map can be represented uniquely as a polynomial of degree < q. [H]

  2. The set S of all maps is a ring under point-wise addition and multiplication. Prove the ring isomorphism .

4.17Let p be a prime and g a primitive element of . For a , prove the explicit formula (mod p). What is the problem in using this formula for computing indices in ?
4.18In the basic ICM for the prime field , we try to factor random powers gα over the factor base B = {q1, . . . , qt}. In addition to the canonical representative of gα in the set {1, . . . , p – 1}, one can also check for the smoothness of the integers gα + kp for –MkM, where M is a small positive integer (to be determined experimentally).
  1. Let ρk,i := (gα + kp) rem qi for i = 1, . . . , t and for –MkM. How can one compute these remainders ρk,i efficiently? Device an algorithm that checks the smoothness of all gα + kp using the values ρk,i. [H]

  2. Device an algorithm that uses a sieve over the interval –MkM.

  3. Explain how the above two strategies can be modified to work for the field .

4.19
  1. Show that for the LSM over the average and the maximum Tmax of |T(c1, c2)| over all values of c1, c2 (that is, for –Mc1c2M) are approximately HM and 2HM, respectively. [H]

  2. For real 0 ≤ η ≤ 1, let , |T(c1, c2)| ≤ ηTmax} and let . Show that t(η) ≈ η(2 – η). (This shows that the distribution of T(c1, c2) is not really random.)

4.20Consider the following modification of the LSM for . Define for the integers and . Choose a small and repeat the linear sieve method for each r, 1 ≤ rs, that is, check the smoothness (over the first t = L[1] primes) of the integers Tr(c1, c2) := Jr + (c1 + c2)Hr + c1c2 for all 1 ≤ rs, –μc1c2μ. Let be the average of |Tr(c1, c2)| over all choices of r, c1 and c2. Show that , where is as defined in Exercise 4.19. In particular, for both the choices: (1) and (2) μ = ⌊M/s⌋, that is, on an average we check smaller integers for smoothness under this modified strategy. Determine the size of the factor base and the total number of integers Tr(c1, c2) checked for smoothness for the two values of μ given above.
4.21

Cubic sieve method (CSM) for Let the integers x, y, z satisfy x3y2z (mod p) with x3y2z. Assume that each of x, y, z is O(pξ).

  1. Show that for integers a, b, c with a + b + c = 0 one has

    (x + ay)(x + by)(x + cy) ≡ y2T(a, b, c) (mod p),

    where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (zc2x). Since x, y, z are O(pξ), we have T(a, b, c) = O(pξ) for small values of a, b, c.

  2. For the CSM, the factor base B comprises all primes q1, . . . , qt with together with the integers x + ay, –MaM, . If T(a, b, c) factors completely over q1, . . . , qt, we get a relation. Show that if we check the smoothness of T(a, b, c) for all –MabcM with a + b + c = 0, we expect to get enough relations to compute the discrete logarithms of elements of B.

  3. In order to carry out sieving, fix c and let b vary. Specify the details of the sieving process. [H]

  4. Specify an algorithm for the second stage of the CSM. [H]

  5. Show that the expected running time of the CSM is . Therefore, if ξ < 1/2, the CSM is asymptotically faster than the LSM method, since the LSM runs in time L[1]. The best possible value ξ = 1/3 corresponds to a running time of the CSM.

4.22The problem with the CSM is that it is not known how to efficiently compute a solution of the congruence

Equation 4.10


subject to the condition that x3y2z and x, y, z = O(pξ) for 1/3 ≤ ξ < 1/2. In this exercise, we estimate the number of solutions of Congruence (4.10).

  1. Show that the total number of solutions of Congruence (4.10) modulo p with x, y, is (p – 1)2 which is Θ(p2).

  2. Show that the total number of solutions of Congruence (4.10) modulo p with x, y, and x3y2z is also Θ(p2).

  3. Under the heuristic assumption that the solutions (x, y, z) of Congruence (4.10) are randomly distributed in , deduce that the expected number of solutions of Congruence (4.10) modulo p with x, y, , x3y2z, and 1 ≤ x, y, zpξ, 1/3 ≤ ξ ≤ 1, is nearly p3ξ–1. (Therefore, if ξ is slightly larger than 1/3, we expect to get a solution. It is not known how to compute such a solution in polynomial (or even subexponential) time. However, for certain values of p a solution is naturally available, for example, if p (or a small multiple of p) is close to an integer cube.)

4.23

Adaptation of CSM for Let be represented as , where the defining polynomial f is of the form f(X) = Xn + f1(X) with deg f1n/3. Let k := ⌈n/3⌉. Show that for polynomials h1, of small degrees (Xk + h1(X))(Xk + h2(X))(Xk + h1(X) + h2(X)) rem f(X) is of degree slightly larger than n/3. Device an ICM for solving the DLP in based on this observation. What is the best running time of this method? [H]

*4.5. The Elliptic Curve Discrete Logarithm Problem (ECDLP)

Unlike the finite field DLP, there are no general-purpose subexponential algorithms to solve the ECDLP. Though good algorithms are known for certain specific types of elliptic curves, all known algorithms that apply to general curves take fully exponential time. The square root methods of Section 4.4 are the fastest known methods for solving the ECDLP over an arbitrary curve. As a result, elliptic curves are gaining popularity for building cryptosystems. The absence of subexponential algorithms implies that smaller fields can be chosen compared to those needed for cryptosystems based on the (finite field) DLP. This, in particular, results in smaller sizes of keys.

We start with Menezes, Okamoto and Vanstone’s (MOV) algorithm that reduces the ECDLP in a curve over to the DLP over the field for some suitable . Since, the DLP can be solved in subexponential time, the ECDLP is also solved in that time, provided that the extension degree is small. For supersingular curves, one can choose k ≤ 6. For non-supersingular curves, this k is quite large, in general, and the MOV reduction takes exponential time.

A linear-time algorithm is known to solve the ECDLP over anomalous curves (that is, curves with trace of Frobenius equal to 1). This algorithm is called the SmartASS method after its inventors Smart, Araki, Satoh and Semaev [257, 265, 282].

J. H. Silverman [277] has proposed an algorithm known as the xedni calculus method for solving the ECDLP over an arbitrary curve. Rigorous running times of this algorithm are not known, however heuristic analysis and experiments suggest that this algorithm is not really practical.

Let E be an elliptic curve over a finite field and let be of order m. We want to compute indP Q (if it exists) for a point . Unless it is necessary, we will not assume any specific defining equation for E or a specific value of q.

**4.5.1. The MOV Reduction

Let us first look at the structure of the group of m-torsion points on an elliptic curve defined over K. Here is the algebraic closure of K.

Theorem 4.2.

Let K be a field of characteristic , and E an elliptic curve defined over K. We consider two separate cases:[5]

[5] For the MOV reduction, only the first case is important.

  1. If p = 0 or if p > 0 does not divide m, then . In particular, in this case.

  2. If p > 0, then either for all or for all .

Now, let E be an elliptic curve defined over a finite field K of characteristic p. Let with gcd(m, p) = 1. We use the shorthand notation E[m] for (and not for EK[m]). We want to define a function

em : E[m] × E[m] → μm,

where is the group of m-th roots of unity (Exercise 4.24). This function em, known as the Weil pairing, helps us reduce the ECDLP in to the DLP in a suitable field . Let P, . The definition of em(P, R) calls for using divisors on E. Recall from Exercise 2.125 that a divisor belongs to (that is, is the divisor of a rational function on E) if and only if and . Since , there is a rational function such that . Now, as well and pm2. Hence, by Theorem 4.2 there exists a point R′ of order m2 such that R = mR′. Since, #E[m] = m2, it follows that and, therefore, there exists a rational function with . The functions f and g as introduced above are unique up to multiplication by elements of . One can show that we can choose f and g in such a manner that f ο λm = gm, where is the multiplication map QmQ. Then for and we have gm(P + U) = f(mP + mU) = f(mU) = gm(U). Since g has only finitely many poles and zeros (whereas is infinite), we can choose U such that both g(U) and g(P + U) are defined and non-zero. For such a point U, we then have and define

em(P, R) := g(P + U)/g(U).

The right side can be shown to be independent of the choice of U. The relevant properties of the Weil pairing em are now listed.

Proposition 4.2.

Let P, P′, R, and a, Then we have:

Identityem(P, P)= 1.
Alternationem(P, R)= em(R, P)–1.
Bilinearityem(P + P′, R)= em(P, R)em(P′, R),
 em(P, R + R′)= em(P, R)em(P, R′),
 em(aP, bR)= (em(P, R))ab.
Non-degeneracyem(P, )= 1.
 If em(P, T) = 1 for all , then .

The above definition of em is not computationally effective. We will see later how we can compute em(P, T) in probabilistic polynomial time using an alternative (but equivalent) definition.

Algorithm 4.7 shows how the MOV reduction algorithm makes use of Weil pairing. We now clarify the subtle details of this algorithm.

Algorithm 4.7. MOV reduction

Input: A point of order m, gcd(m, q) = 1, and a multiple Q of P.

Output: The index indP Q, that is, an integer l with Q = lP.

Steps:

Choose the smallest  such that .
while (1) {
   Choose a random point .
   α := em(PR),   β := em(QR).  /* α, 
   l := indα β.   /* Discrete logarithm in  */
   if (Q = lP) { Return l. }
}

The correctness of the algorithm

From the bilinearity of the Weil pairing, it follows that if Q = lP, 0 ≤ l < m, then β = em(Q, R) = em(lP, R) = em(P, R)l = αl. Thus treating indα β as the least nonnegative integer modulo ord α we conclude that l = indα β if and only if ord α = m, that is, α is a primitive m-th root of unity. That α is an m-th root of unity for any is obvious from the definition of em. We now show that there exists some for which α = em(P, R) is primitive.

Lemma 4.1.

Let be of order m (so that P generates the subgroup 〈P〉 of order m in E[m]). Then for any R1, the cosets R1 + 〈P〉 and R2 + 〈P〉 are equal if and only if em(P, R1) = em(P, R2).

Proof

If R1 + 〈P〉 = R2 + 〈P〉, then R1 = R2 + rP for some integer r and so by bilinearity and identity of Weil pairing em(P, R1) = em(P, R2)em(P, P)r = em(P, R2).

Conversely, let em(P, R1) = em(P, R2). By Theorem 4.2, is generated by two elements of order m. We can take one of these elements to be P, let P′ be the other element and write R1R2 = aP + aP′ for some a, . Then em(P, R1) = em(P, R2 + aP + aP′) = em(P, R2)em(P, P)aem(P, aP′), whence it follows that em(P, aP′) = 1. Finally, for an arbitrary , b, , we have em(aP′, T) = em(aP′, bP + bP′) = em(aP′, P)bem(P′, P′)ab = em(P, aP′)b = 1. By the non-degeneracy property of em, it then follows that , that is, .

As an immediate corollary to Lemma 4.1, the desired result follows.

Proposition 4.3.

Let be of order m and let

Then #S/#E[m] = φ(m)/m. In particular, S is non-empty.

Proof

There are m distinct cosets of 〈P〉 in E[m]. Now, as R ranges over all points of E[m], the coset R+〈P〉 ranges over all of these m possibilities and, accordingly by Lemma 4.1 the value em(P, R) ranges over m distinct values. Since μm is cyclic of order m and hence with φ(m) generators, the theorem follows.

By Theorem 3.1, one should try an expected number of O(ln ln m) random points before a primitive m-th root α = em(P, R) is found.

Choosing k

Since E[m] consists of finitely many (m2) points, it is obvious that there exist finite values of k such that . It can also be shown that if , then that is, for all P, . The computation of the discrete logarithm indα β is then carried out in . For Algorithm 4.7 to be efficient, one requires k to be rather small. However, for most curves, k is rather large implying that the MOV reduction is impractical for these curves. For the specific class of curves, the so-called supersingular curves, one can choose k to be rather small, namely k ≤ 6. We don’t go to the details of the choices for k for various cases of supersingular curves, but refer the reader to Menezes [192].

Computing em(P, R)

We start with an alternative definition of the Weil pairing for P, . First note that if is a divisor and if is a rational function on E such that for every pole or zero T of f one has mT = 0 (that is, such that Div(f) and T have disjoint supports), then one can define

Choose points U, (where ) and consider the divisors DP := [P + U] – [U] and DR := [R + V] – [V]. Since ) is infinite, one can choose both P + U and U distinct from R + V and V. Since P, , it follows that mDP and mDR are principal, namely, there are rational functions fP and fR such that Div(fP) = mDP = m[P + U] – m[U] and Div(fR) = mDR = m[R + V] – m[V]. One can show that

Equation 4.11


independent of the choice of U and V as long as fP (DR) and fR(DP) are defined. Therefore, em(P, R) can be computed efficiently, if fP and fR can be computed efficiently. To this effect we now describe an algorithm for computing the rational function f of a principal divisor , where . Since deg , we can write . Suppose that we have an Algorithm A that, for a pair of reduced divisors

and

computes the sum (a reduced divisor)

Then, f can be computed by repeated application of Algorithm A as follows.

  1. Compute for each i = 1, . . . , r the reduced divisor . Let 1 = ai1, ai2, . . . , aiti = |mi| be an addition chain for |mi| (Exercise 3.18). Clearly, ti – 1 applications of Algorithm A computes Δi. Since we can choose ti ≤ 2 ⌈lg |mi|⌉, each Δi can be computed using O(log |mi|) applications of Algorithm A.

  2. Compute f by computing D = Div(f) = Δ1 + ··· + Δr. This can be done by applying Algorithm A a total of r – 1 times.

What remains is the description of Algorithm A that computes P3 and f3 from a knowledge of P1, P2, f1 and f2. Clearly, if , then we have P3 = P2 and f3 = f1f2. Similar is the case for . So assume and . Let l1 be the line passing through P1 and P2 and P′ := –(P1 + P2). First, assume that . By Exercise 2.125, we have . Let l2 be the (vertical) line passing through P′ and –P′. Again by Exercise 2.125, we have . But then , that is, we take P3 = –P′ = P1+P2 and f3 = f1f2l1/l2. Finally, if , then and, therefore, . Thus, in this case too, we take and f3 = f1f2l1/l2 with l2 := 1.

Before we finish the description of the MOV reduction, some comments are in order. First note that if f1, and P1, , then both l1 and l2 are in K(E) and the computation of f3 and P3 can be carried out by working in K only.

Second, consider the (general) case . Since , the rational function f3 has poles and is, therefore, undefined only at the points P3 and . f3 is certainly defined at –P3, but l2(–P3) = 0 and, therefore, evaluating f3(–P3) as (f1f2l1)(–P3)/l2(–P3) fails. Of course, there is a rational function g such that both f1f2l1g and l2g are defined and non-zero at –P3, but finding such a rational function is an added headache. So we choose to continue to have the representation f3 = f1f2l1/l2 and agree not to evaluate f3 at –P3. Recall from Equation (4.11) that we want to evaluate fP at DR (that is, at R + V and V) and also fR and DP (that is, at P + U and U). Let us assume that we use the addition chain 1 = a1, a2, . . . , at = m for m. This means that we cannot evaluate fP at the points ±ai(P + U) and ±aiU for all i = 1, . . . , t. Therefore, V should be chosen such that both R + V and V are not one of these points. Similar constraints dictate the choice of U. However, if m is sufficiently large (m ≥ 1024) and if we choose an addition chain of length t ≤ 2 ⌈lg m⌉, then it can be easily seen that for a random choice of (U, V) the evaluation of fP (DR) or fR(DP) fails with a probability of no more than 1/2. Therefore, few random choices of (U, V) are expected to make the algorithm work. This is the only place where a probabilistic behaviour of the algorithm creeps in. In practice, however, this is not a serious problem, since we have much larger values of m (than 1024) and accordingly the above probability of failure becomes negligibly small.

Finally, note that if we multiply the factors f1, f2 and l1 in the numerator, then the coefficients of the numerator grow very rapidly, when the algorithm is applied repeatedly. Thus we prefer to keep the numerator in the factored form. The same applies to the denominator as well.

**4.5.2. The SmartASS Method

The SmartASS method, named after its inventors Smart [282], Satoh and Araki [257] and Semaev [265], is also called the anomalous attack to solve the ECDLP, since it is applicable to anomalous elliptic curves. Let be a finite field of odd prime cardinality p and E an elliptic curve defined over . We assume that E is anomalous: that is, the trace of Frobenius of E at p is 1; that is, . Since p is prime, the group is cyclic and, in particular, isomorphic to the additive group (, +). This isomorphism is effectively exploited by the SmartASS method to give a polynomial time algorithm for computing ECDLP in the group .

Before proceeding further we introduce some auxiliary results. Recall (Exercise 2.133) that a local PID is called a discrete valuation ring (DVR). Now, we see an equivalent definition of a DVR, that gives a justification to its name.

Definition 4.3.

A discrete valuation on a field K is a surjective group homomorphism

such that for every a, we have v(a + b) ≥ min(v(a), v(b)). We extend the definition of v to a map by setting v(0) = +∞. The set

is a ring called the valuation ring of v.

A DVR can be characterized as follows:

Proposition 4.4.

Let R be an integral domain and let K := Q(R) be the field of fractions of R. Then R is a DVR if and only if there exists a discrete valuation of K such that R is the valuation ring of v.

Proof

[if] By definition, . We have v(1) = v(1 · 1) = v(1) + v(1), so that v(1) = 0. If ab = 1 for some a, , then 0 = v(1) = v(ab) = v(a) + v(b). Since v(a), v(b) ≥ 0, it follows that v(a) = v(b) = 0. Conversely, let v(a) = 0 for some , a ≠ 0. Now, and we have 0 = v(1) = v(aa–1) = v(a) + v(a–1) = v(a–1): that is, . We conclude that is a unit if and only if v(a) = 0. Any proper ideal of R consists only of non-units and hence is contained in the set which is easily seen to be an ideal of R. Thus R is a local domain with maximal ideal .

Let and define . Clearly, each is an ideal of R. For an arbitrary non-zero ideal of R, consider . If i = 0, then contains a unit, that is, . So assume i > 0. Clearly, . Conversely, let , so that v(a) ≥ i. Choose with v(b) = i. But then iv(a) = v(ab–1) + v(b) = v(ab–1) + i: that is, v(ab–1) ≥ 0; that is, ; that is, . Thus, . In other words, , , are the only non-zero ideals of R. These ideals form the (infinite) descending chain .

By definition, is surjective. Let be such that v(x) = 1. The principal ideal 〈x〉 is not the unit ideal, satisfies and hence equals . One can likewise show that for all . Thus R is a PID. [only if] See Exercise 2.133.

Recall that the ring of p-adic integers (Definition 2.111) is a DVR. The field of fractions of is called the field of p-adic numbers. We now explicitly describe a valuation v on of which is the valuation ring. Let the p-adic expansion (Exercises 2.144 and 2.145) of a p-adic integer α be

Equation 4.12


A rational integer can be naturally viewed as a p-adic integer with finitely many nonzero terms, that is, one for which ki = 0 except for finitely many . However, a p-adic integer with infinitely many non-zero ki does not correspond to a rational integer. If in Expansion (4.12) we have k1 = k2 = ··· = kr–1 = 0, we can write

α = pr(kr + kr+1p + kr+2p2 + ···).

A p-adic integer is, in general, an infinite series and a representation with finite precision looks like

k0 + k1p + k2p2 + ··· + ksps + O(ps+1).

Arithmetic on p-adic numbers is done like integers written in base p, but from left to right. Thus, for example, if one wants to add two p-adic integers k0 + k1p + k2p2 + ... and , one may add the base-p integers ... k2k1k0 and in the usual manner till the desired level of precision. A p-adic integer α = k0 + k1p k2p2 + ··· is invertible (in ) if and only if k0 ≠ 0 (Proposition 2.52).

An element also has a p-adic expansion, but in this case one has to allow terms involving a finite number of negative exponents of p. That is to say, we have an expansion of the form

β = ktpt + kt+1pt+1 + ··· + k–1p–1 + k0 + k1p + k2p2 + ···

or

β = pt(kt + kt+1p + ··· + k–1pt–1 + k0pt + k1pt+1 + k2pt+2 + ···).

Of course, if kt = kt+1 = ··· = k–1 = 0, then β is already in .

From the arguments above, it follows that any non-zero can be written uniquely as γ = pδ0 + γ1p + γ2p2 + ···) with and γ0 ≠ 0. We then set v(γ) := δ. It is easy to see that v defines a discrete valuation on of which is the valuation ring. Moreover, since γ0 + γ1p + γ2p2 + ··· is a unit in , p = 0 + 1 · p + 0 · p2 + ··· plays the role of a uniformizer of the DVR . As usual, we write v(0) = +∞.

Now, back to our ECDLP business. Let E be an elliptic curve defined over . Here we consider the case that E is anomalous. We can naturally think of E as a curve over the field as well and denote this curve by ε. The coordinate-wise application of the canonical surjection induces the reduction homomorphism . Now, we define the following subgroups of :

It can be shown that is a subgroup of and is a subgroup of . Furthermore, since E is anomalous, we have

Now, let and Q a point in the subgroup of generated by P. Our purpose is to find an integer l such that Q = lP. Let , be such that and . It is not difficult to find such points and . For example, if P = (a, b), we can take , where b0 = b and b1, b2, . . . are successively obtained by Hensel lifting.

Since , the point and, therefore, . Now, if we take the so-called p-adic elliptic logarithm ψp on both sides, we get (mod p2), whence it follows that

provided that is invertible modulo p. The function ψp can be easily calculated. Therefore, this gives a very efficient probabilistic algorithm for computing discrete logarithms over anomalous elliptic curves. Here the most time-consuming step is the linear-time computation of the points p and p. For further details on the algorithm (like the computation of and from P and Q, and the definition of p-adic elliptic logarithms), see Blake et al. [24] and Silverman [275].

**4.5.3. The Xedni Calculus Method

Joseph Silverman’s xedni calculus method (XCM) is a recent algorithm for solving the ECDLP in an arbitrary elliptic curve over a finite field. The algorithm is based on some deep mathematical conjectures and heuristic ideas. However, its performance has been experimentally established to be poor. Here we give a sketchy description of the XCM. For simplicity, we concentrate on elliptic curves over prime fields only.

The basic idea of the XCM is to lift an elliptic curve E over to a curve ε over . In view of this, we start with a couple of important results regarding elliptic curves over (or, more generally, over a number field). See Silverman [275], for example, for the proofs.

Let ε be an elliptic curve defined over a number field K.

Theorem 4.3. Mordell–Weil theorem

The group ε(K) is finitely generated.

The group structure of ε(K) is made explicit by the next theorem. Note that the elements of ε(K) of finite order form a subgroup εtors(K) of ε(K), called the torsion subgroup of ε(K) (Exercise 4.26).

Theorem 4.4.

for some .

The non-negative integer ρ of Theorem 4.4 is called the rank of ε(K).

Now, let E be an elliptic curve defined over a prime field , and Q a multiple of P. Our task is to compute an integer such that Q = lP. We assume that E is defined by a suitable Weierstrass equation. We consider the projective coordinates of points on . Let n denote the cardinality of .

The basic idea of the XCM is to select r points , compute an elliptic curve ε defined over and points such that modulo p the curve ε reduces to E and the points S1, . . . , Sr to Rp,1, . . . , Rp,r. If the rank of ε is small, then the points S2, . . . , Sr are expected to be linearly dependent. Computing a non-trivial linear dependency among S2, . . . , Sr gives a linear dependency among Rp,1, . . . , Rp,r, which in turn yields indP Q with high probability. The details are now explained. For r points Li := [hi, ki, li], i = 1, . . . , r, we use the notation:

We start by fixing an integer r, 4 ≤ r ≤ 9. We then choose r random pairs (si, ti) of integers and compute the points

We now apply a change of coordinates of the form

Equation 4.13


so that the first four of the points Rp,i become Rp,1 = [1, 0, 0], Rp,2 = [0, 1, 0], Rp,3 = [0, 0, 1] and Rp,4 = [1, 1, 1]. This change of coordinates fails if some three of the four points Rp,1, Rp,2, Rp,3 and Rp,4 sum to . But in that case the desired index indP Q can be computed with high probability. If, for example, , then we have (s1 + s2 + s3)P = (t1 + t2 + t3)Q and, therefore, if gcd(t1 + t2 + t3, n) = 1, then indP Q ≡ (t1 + t2 + t3)–1(s1 + s2 + s3) (mod n). On the other hand, if gcd(t1 + t2 + t3, n) ≠ 1, we repeat with a different set of pairs (si, ti).

Henceforth, we assume that the change of coordinates, as given in Equation (4.13), is successful. This transforms the equation for E to a general cubic equation:

Cp : up,1X3 + up,2X2Y + up,3XY2 + up,4Y3 + up,5X2Z + up,6XY Z + up,7Y2Z + up,8XZ2 + up,9Y Z2 + up,10Z3 = 0.

Now, we carry out a step that heuristically ensures that the curve ε over (that we are going to construct) has a small rank. We choose a product M of small primes with pM, a cubic curve

CM : uM,1X3 + uM,2X2Y + uM,3XY2 + uM,4Y3 + uM,5X2Z + uM,6XYZ + uM,7Y2Z + uM,8XZ2 + uM,9Y Z2 + uM,10Z3 ≡ 0 (mod M)

over and points RM,1, . . . , RM,r on CM and with coordinates in . The first four points should be RM,1 = [1, 0, 0], RM,2 = [0, 1, 0], RM,3 = [0, 0, 1] and RM,4 = [1, 1, 1]. We have to ensure also that for every prime divisor q of M, the matrix B(RM,1, . . . , RM,r) has maximal rank modulo q. In practice, it is easier to choose the points RM,1, . . . , RM,r first and then compute a curve CM passing through these points by solving a set of linear equations in the coefficients uM,1, . . . , uM,10 of CM. The curve CM should be so chosen that it has the minimum possible number of solutions modulo M. This, in conjunction with some deep conjectures in the theory of elliptic curves, guarantees that the curve ε that we will construct shortly will have a rank less than the expected value.

We now combine the curves Cp and CM as follows. Using the Chinese remainder theorem, we compute integers such that (mod p) and (mod M) for each i = 1, . . . , 10. Similarly, we compute points R1, . . . , Rr with integer coefficients such that RiRp,i (mod p) and RiRM,i (mod M) for each i = 1, . . . , r, where congruence of points stands for coordinate-wise congruence. Here we have R1 = [1, 0, 0], R2 = [0, 1, 0], R3 = [0, 0, 1] and R4 = [1, 1, 1].

Clearly, the points R1, . . . , Rr are lifts of the points Rp,1, . . . , Rp,r respectively, whereas the cubic curve

over is a lift of E. However, , treated as a curve over , need not pass through the points R1, . . . , Rr. In order to ensure this last condition, we modify the coefficients of to the (small integer) coefficients u1, . . . , u10 by solving the system of linear equations

subject to the condition that (mod pM) for each i = 1, . . . , 10. The resulting cubic curve

C : u1X3 + u2X2Y + u3XY2 + u4Y3 + u5X2Z + u6XYZ + u7Y2Z + u8XZ2 + u9Y Z2 + u10Z3 = 0

over evidently continues to be a lift of E.

Now, we apply a change of coordinates in order to transfer to the standard Weierstrass equation

ε : Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6

with integer coefficients ai. This transformation changes the points R1, . . . , Rr to the points S1, . . . , Sr. One should also ensure that .

Finally, we check if S2, . . . , Sr are linearly dependent. If so, we determine a (non-trivial) relation with . This corresponds to the relation , where n1 := –(n2 + ··· + nr), that is, sP = tQ with s := n1s1 + ··· + nrsr and t := n1t1 + ··· + nrtr. If gcd(t, n) = 1, we have indP Qt–1s (mod n).

On the other hand, if S2, . . . , Sr are linearly independent or if gcd(t, n) > 1, then the lifted data fail to compute indP Q. In that case, we repeat the entire process by selecting new pairs (si, ti) and/or new points RM,1, . . . , RM,r.

This completes our description of the XCM. See Silverman [277] for further details. No rigorous or heuristic analysis of the running time of the XCM is available in the literature. Practical experience (reported in Jacobson et al. [139]) shows that the algorithm is rather impractical. The predominant cause for failure of a trial of the XCM is that the probability that the points S2, . . . , Sr are linearly dependent is amazingly low. Suitable choices of the curve CM help us to construct curves ε of low rank, but not low enough, in general, to render S2, . . . , Sr linearly dependent. Larger values of r are expected to increase the probability of success in each trial, but it is not clear how to handle the values r > 9. Nevertheless, the XCM is a radically new idea to solve the ECDLP. As Joseph Silverman [277] says, “some of the ideas may prove useful in future work on ECDLP”.

Exercise Set 4.5

4.24Let K be a field, and . Elements of μm are called the m-th roots of unity. Prove the following assertions.
  1. μm is a subgroup of (, ·).

  2. If char K = 0, then m = m. [H]

  3. If p := char K > 0, then m = m/pvp(m). [H]

  4. μm is cyclic. [H]

  5. The set is a subgroup of .

4.25We use the notations of the last exercise and assume that m = m, that is, either char K = 0 or p := char K > 0 is coprime to m. In this case, a generator of μm is called a primitive m-th root of unity. If is a primitive m-th root of unity and ωr = 1 for some , then evidently m|r. In particular, m is the smallest of the exponents such that ωr = 1. The (monic) polynomial

where the product runs over all primitive m-th roots of unity, is called the m-th cyclotomic polynomial (over K). Clearly, deg Φm(X) = φ(m) (where φ is Euler’s totient function).

  1. Show that . [H] Use the Möbius inversion formula to deduce that , where μ is the Möbius function. Conclude that .

  2. If m is a prime, show that Φm(X) = Xm–1 + ··· + X + 1.

  3. Let m ≠ 1 be odd and char K ≠ 2. Show that Φ2m(X) = Φm(–X). [H]

  4. Show that if , l is the (multiplicative) order of q modulo m and if ω is a primitive m-th root of unity, then [K(ω) : K] = l. [H] In particular, Φm is a product of φ(m)/l (distinct) irreducible polynomials each of degree l.

4.26
  1. Let G be an (additive) Abelian group (not necessarily finite). Show that the subset

    is a subgroup of G. Gtors is called the torsion subgroup of G and the elements of Gtors are called torsion elements of G. An element is a torsion element of G if and only if a is of finite order.

  2. Let ε be an elliptic curve defined over a number field K. Show that the torsion subgroup εtors(K) of ε(K) is finite. [H]

  3. Let ε and K be as in Part (b). Show that is not finite. [H]

**4.6. The Hyperelliptic Curve Discrete Logarithm Problem

The hyperelliptic curve discrete logarithm problem (HECDLP) has attracted less research attention than the ECDLP. Surprisingly, however, there exist subexponential (index calculus) algorithms for solving the HECDLP over curves of large genus. Adleman, DeMarrais and Huang first proposed such an algorithm [2] (which we will refer to as the ADH algorithm). Enge [86] suggested some modifications of the ADH algorithm and provided rigorous analysis of its running time. Gaudry [105] simplified the ADH algorithm and even implemented it. Gaudry’s experimentation suggests that it is feasible to compute discrete logarithms in Jacobians of almost cryptographic sizes, given that the genus of the underlying curve is high (say ≥ 6). Enge and Gaudry [87] proved rigorously that as long as the genus g is greater than ln q ( being the field over which the curve is defined), the ADH algorithm (and its improvements) run in time L(qg, 1/2, ).

In what follows, we outline Gaudry’s version of the ADH algorithm and refer to this as the ADH–Gaudry algorithm. Let C : Y2 + u(X)Y = v(X) be a hyperelliptic curve of genus g defined over a finite field . We assume that the cardinality of the Jacobian is known and has a suitably large prime divisor m. We assume further that a reduced divisor of order m is available, and we want to compute the discrete logarithm indα β of with respect to α.

4.6.1. Choosing the Factor Base

Recall that every reduced divisor can be written uniquely as , lg, where for ij the points Pi and Pj are not opposite of each other. Only ordinary points (not special points) may appear more than once in the list P1, . . . , Pl. We also know that such a divisor can be represented by a pair of unique polynomials a, satisfying deg b < deg ag and a|(b2 + buv). In that case, we write D = Div(a, b). What interests us is the fact that the roots of the polynomial a are precisely the X-coordinates of the points P1, . . . , Pl. This fact leads to the very useful concepts of prime divisors and smooth divisors.

Definition 4.4.

A divisor is called prime, if the polynomial is irreducible (that is, prime) over .

For an arbitrary divisor , let a = a1 · · · ar be the factorization of a into irreducible polynomials ai over . There exist polynomials such that , where Di := Div(ai, bi). In that case, the (prime) divisors D1, . . . , Dr are called the prime divisors of D. Moreover, if deg ai ≤ δ for all i = 1, . . . , r and for some , then D is called δ-smooth. In particular, D = Div(a, b) is 1-smooth if and only if a splits completely over .

In order to set up a factor base B, we predetermine a smoothness bound δ and let B consist of all the prime divisors with deg a δ. For simplicity, we take δ = 1. This is indeed a practical choice, when the genus g is not too large (say, g ≤ 9). Let be an (irreducible) polynomial of degree 1. In order to find out such that Div(a, b) is a prime divisor, we first see that deg b < deg a, that is, . Furthermore, a|(b2 + buv): that is, b2 + buv ≡ 0 (mod Xh); that is, b2 + bu(h) – v(h) = 0. Thus, the desired values of , if existent, can be found by solving a quadratic equation over . There are q irreducible polynomials of degree 1 and for each such a there are either two or no solutions for . Assuming that both these possibilities are equally likely, we conclude that the size of the factor base is ≈ q.

4.6.2. Checking the Smoothness of a Divisor

In order to check for the smoothness of a divisor over the factor base B, we first factor a over . Under the assumption that δ = 1, the divisor D is smooth if and only if a splits completely over . Let us write a(X) = (Xh1) ··· (Xhl), . Then for some we have , where Di := Div(Xhi, ki). We may use trial divisions (that is, trial subtractions in this additive setting) by elements of B in order to determine the prime divisors D1, . . . , Dl of D. Proposition 4.5 establishes the probability that a randomly chosen element of is smooth.

Proposition 4.5.

For q ≫ 4g2, there are approximately qg/g! (1-)smooth divisors in . In particular, the probability that a randomly chosen divisor in is smooth is approximately 1/g!.

The assumption q ≫ 4g2 is practical, since we usually employ curves of (fixed) small genus g over finite fields of medium sizes. For example, Koblitz [154] proposed the curve Y2 + Y = X13 of genus g = 6 over the prime field . An interesting consequence of the last proposition is that the proportion of smooth divisors in depends only on the genus g of C (and not on q).

4.6.3. The Algorithm

Now, we have all the machinery required to describe the basic version of the index calculus method for computing indα β in . In the first stage, we choose a random and compute the (reduced) divisor jα and check if jα is smooth over the factor base B. Every smooth jα gives a relation: that is, a linear congruence modulo m involving the (unknown) indices of the elements of B to the base α. After sufficiently many (say, ≥ 2(#B)) such relations are found, the system of linear congruences collected is expected to be of full rank and is solved modulo m. This gives us the indices of the elements of the factor base. Each congruence collected above contains at most g non-zero coefficients and so the system is necessarily sparse. In the second stage, we find out a single random j for which β +jα is smooth. The database prepared in the first stage then immediately gives indα β.

The Hasse–Weil Bounds (3.8) on p 226 show that the cardinality of is approximately qg. Thus O(g log q) bits are needed to represent an element of . This fact is consistent with the representation of reduced divisors by pairs of polynomials. Gaudry [105] calculates that this variant of the ICM does O(q2 + g!q) operations, each of which takes polynomial time in the input size g log q. If g is considered to be constant, the running time becomes O(q2 logt q) (that is, O~(q2)) for some real t > 0. A square root method on runs in (expected) time O~(qg/2). Thus for g > 4 the index calculus method performs better than the square root methods. Indeed Gaudry’s implementation of this algorithm is capable of computing in a few days discrete logs in the curve of genus 6 mentioned above. The Jacobian of this curve is of cardinality ≈ 1040.

For cryptographic purposes, we should have . If we want to take q small (so that multi-precision arithmetic can be avoided), we should choose large values of g. But this choice makes the ADH–Gaudry algorithm quite efficient. For achieving the desired level of security in cryptographic applications, hyperelliptic curves of genus 2, 3 and 4 only are recommended.

4.7. Solving Large Sparse Linear Systems over Finite Rings

So far we have seen many algorithms which require solving large systems of linear equations (or congruences). The number n of unknowns in such systems can be as large as several millions. Standard Gaussian elimination on such a system takes time O(n3) and space O(n2). There are asymptotically faster algorithms like Strassen’s method [292] that takes time O(n2.807) and Coppersmith and Winograd’s method [60] having a running time of O(n2.376). Unfortunately, these asymptotic estimates do not show up in the range of practical interest. Moreover, the space requirements of these asymptotically faster methods are prohibitively high (though still O(n2)).

Luckily enough, cryptanalytic algorithms usually deal with coefficient matrices that are sparse: that is, that have only a small number of non-zero entries in each row. For example, consider the system of linear congruences available from the relation collection stage of an ICM for solving the DLP over a finite field . The factor base consists of a subexponential (in lg q) number of elements, whereas each relation involves at most O(lg q) non-zero coefficients. Furthermore, the sparsity of the resulting matrix A is somewhat structured in the sense that the columns of A corresponding to larger primes in the factor base tend to have fewer numbers of non-zero entries. In this regard, we refer to the interesting analysis by Odlyzko [225] in connection with the Coppersmith method (Section 4.4.4). Odlyzko took m = 2n equations in n unknown indices and showed that about n/4 columns of A are expected to contain only zero coefficients, implying that these variables never occurred in any relation collected. Moreover, about 0.346n columns of A are expected to have only single non-zero coefficients.

The sparsity (as well as the structure of the sparsity) of the coefficient matrix A can be effectively exploited and the system can be solved in time O~(n2). In this section, we describe some special algorithms for large sparse linear systems. In what follows, we assume that we want to compute the unknown n-dimensional column vector x from the given system of equations

Ax = b,

where A is an m × n matrix, mn, and where b is a non-zero m-dimensional column vector. Though this is not the case in general, we will often assume for the sake of simplicity that A has full rank (that is, n). We write vectors as column vectors, that is, an l-dimensional vector v with elements v1, . . . , vl is written as v = (v1 v2 . . . vl)t, where the superscript t denotes matrix transpose.

Before we proceed further, some comments are in order. First note that our system of equations is often one over the finite ring which is not necessarily a field. Most of the methods we describe below assume that is a field, that is, r is a prime. If r is composite, we can do the following. First, assume that the prime factorization , αi > 0, of r is known. In that case, we first solve the system over the fields for i = 1, . . . , s. Then for each i we lift the solution modulo pi to the solution modulo . Finally, all these lifted solutions are combined using the CRT to get the solution modulo r.

Hensel lifting can be used to lift a solution of the system Axb (mod p) to a solution of Axb (mod pα), where p is a prime and . We proceed by induction on α. Let us denote the (or a) solution of Axb (mod p) by x1, which can be computed by solving a system in the field . Now, assume that for some we know (integer) vectors x1, . . . , xi such that

Equation 4.14


We then attempt to compute a vector xi+1 such that

Equation 4.15


Congruence (4.14) shows that the elements of A, x1, . . . , xi, b can be so chosen (as integers) that for some vector yi we have the equality

A(x1 + px2 + ··· + pi–1xi) = bpiyi

in . Substituting this in Congruence (4.15) gives Axi+1yi (mod p). Thus the (incremental) vector xi+1 can be obtained by solving a linear system in .

It, therefore, suffices to know how to solve linear congruences modulo a prime p. However, problems arise, when we do not know the factorization of r (while solving Axb (mod r)). If r is large, it would be a heavy investment to make attempts to factor r. What can be done instead is the following. First, we use trial divisions to extract the small prime factors of r. We may, therefore, assume that r has no small prime factors. We proceed to solve Axb (mod r) assuming that r is a prime (that is, that is a field). In a field, every non-zero element is invertible. But if r is composite, there are non-zero elements which are not invertible (that is, for which gcd(a, r) > 1). If, during the course of the computation, we never happen to meet (and try to invert) such non-zero non-invertible elements, then the computation terminates without any trouble. Otherwise, such an element a corresponds to a non-trivial factor gcd(a, r) of r. In that case, we have a partial factorization of r and restart solving the system modulo each suitable factor of r.

Some of the algorithms we discuss below assume that A is a symmetric matrix. In our case, this is usually not the case. Indeed we have matrices A which are not even square. Both these problems can be overcome by trying to solve the modified system AtAx = At b. If A has full rank, this leads to an equivalent system.

If r = 2 (as in the case of the QSM for factoring integers), using the special methods is often not recommended. In this case, the elements of A are bits and can be packed compactly in machine words, and addition of rows can be done word-wise (say, 32 bits at a time). This leads to an efficient implementation of ordinary Gaussian elimination, which usually runs faster than the more complicated special algorithms described below, at least for the sizes of practical systems.

In what follows, we discuss some well-known methods for solving large sparse linear systems over finite fields (typically prime fields). In order to simplify notations, we will refrain from writing the matrix equalities as congruences, but treat them as equations over the underlying finite fields.

4.7.1. Structured Gaussian Elimination

Structured Gaussian elimination is applied to a sparse system before one of the next three methods is employed to solve the system. If the sparsity of A has some structures (as discussed earlier), then structured Gaussian elimination tends to reduce the size of the system considerably, while maintaining the sparsity of the system. We now describe the essential steps of structured Gaussian elimination. Let us define the weight of a row or column of a matrix to be the number of non-zero entries in that row or column.

First we delete all the columns (together with the corresponding variables) that have weight 0. These variables never occur in the system and need not be considered at all.

Next we delete all the columns that have weight 1 and the rows corresponding to the non-zero entries in these columns. Each such deleted column correspond to a variable xi that appears in exactly one equation. After the rest of the system is solved, the value of xi is obtained by back substitution. Deleting some rows in the matrix in this step may expose some new columns of weight 1. So this step should be repeated, until all the columns have weight > 1.

Now, choose each row with weight 1. This gives a direct solution for the variable xi corresponding to the non-zero entry of the row. We then substitute this value of xi in all the equations where it occurs and subsequently delete the ith column. We repeat this step, until all rows are of weight > 1.

At this point, the system usually has many more equations than variables. We may make the system a square one by throwing away some rows. Since subtracting multiples of rows of higher weights tends to increase the number of non-zero elements in the matrix, we should throw away the rows with higher weights. While discarding the excess rows, we should be careful to ensure that we are not left with a matrix having columns of weight 0. Some columns in the reduced system may again happen to have weight 1. Thus, we have to repeat the above steps again. And again and again and . . . , until we are left with a square matrix each row and column of which has weight ≥ 2.

This procedure leads to a system which is usually much smaller than the original system. In a typical example quoted in Odlyzko [225], structured Gaussian elimination reduces a system with 16,500 unknowns to one with less that 1,000 unknowns. The resulting reduced system may be solved using ordinary Gaussian elimination which, for smaller systems, appears to be much faster than the following sophisticated methods.

4.7.2. The Conjugate Gradient Method

The conjugate gradient method was originally proposed to solve a linear system Ax = b over for an n × n (that is, square) symmetric positive definite matrix A and for a nonzero vector b and is based on the idea of minimizing the quadratic function . The minimum is attained, when the gradient ∇f = Axb equals zero, which corresponds to the solution of the given system.

The conjugate gradient method is an iterative procedure. The iterations start with an initial minimizer x0 which can be any n-dimensional vector. As the iterations proceed, we obtain gradually improved minimizers x0, x1, x2, . . . , until we reach the solution. We also maintain and update two other sequences of vectors ei and di. The vector ei stands for the error bAxi, whereas the vectors d0, d1, . . . constitute a set of mutually conjugate (that is, orthogonal) directions. We initialize e0 = d0 = bAx0 and for i = 0, 1, . . . repeat the steps of Algorithm 4.8, until ei = 0. We denote the inner product of two vectors v = (v1 v2 . . . vn)t and w = (w1 w2 . . . wn)t by .

Algorithm 4.8. An iteration in the conjugate gradient method

ai := 〈ei, ei〉/〈di, Adi〉.

xi+1 := xi + aidi.

ei+1 := eiaiAdi.

bi := 〈ei+1, ei+1〉/〈ei, ei〉.

di+1 := ei+1 + bidi.

This method computes a set of mutually orthogonal directions d0, d1, . . . , and hence it has to stop after at most n – 1 iterations, since we run out of new orthogonal directions after n – 1 iterations. Provided that we work with infinite precision, we must eventually obtain ei = 0 for some i, 0 ≤ in – 1.

If A is sparse, that is, if each row of A has O(logc n) non-zero entries, c being a positive constant, then the product Adi can be computed using O~(n) field operations. Other operations clearly meet this bound. Since at most n – 1 iterations are necessary, the conjugate gradient method terminates after performing O~(n2) field operations.

We face some potential problems, when we want to apply this method to solve a system over a finite field . First, the matrix A is usually not symmetric and need not even be square. This problem can be avoided by solving the system AtAx = At b. The new coefficient matrix AtA may be non-sparse (that is, dense). So instead of computing and working with AtA explicitly, we compute the product (AtA)di as At (Adi), that is, we avoid multiplication by a (possibly) dense matrix at the cost of multiplications by two sparse matrices.

The second difficulty with a finite field is that the question of minimizing an -valued function makes hardly any sense (and so does positive definiteness of a matrix over ). However, the conjugate gradient method is essentially based on the generation of a set of mutually orthogonal vectors d0, d1, . . . . This concept continues to make sense in the setting of a finite field.

If A is a real positive definite matrix, we cannot have 〈di, Adi〉 = 0 for a nonzero vector di. But this condition need not hold for a matrix A over . Similarly, we may have a non-zero error vector ei over , for which 〈ei, ei〉 = 0. (Again this is not possible for real vectors.) So for the iterations over (more precisely, the computations of ai and bi) to proceed gracefully, all that we can hope for is that before reaching the solution we never hit a non-zero direction vector di for which 〈di, Adi〉 = 0 nor a non-zero error vector ei for which 〈ei, ei〉 = 0. If q is sufficiently large and if the initial minimizer x0 is sufficiently randomly chosen, then the probability of encountering such a bad di or ei is rather low and as a result the method is very likely to terminate without problems. If, by a terrible stroke of bad luck, we have to abort the computation prematurely, we should restart the procedure with a new random initial vector x0. If q is small (say q = 2 as in the case of the QSM), it is a neater idea to select the entries of the initial vector x0 from a field extension and work in this extension. The eventual solution we will reach at will be in , but working in the larger field decreases the possibility of an attempt of division by 0.

There is, however, a brighter side of using a finite field in place of , namely every calculation we perform in is exact, and we do not have to bother about a criterion for determining whether an error vector ei is zero or about the conditioning of the matrix A. One of the biggest headaches of numerical analysis is absent here.

4.7.3. The Lanczos Method

The Lanczos method is another iterative method quite similar to the conjugate gradient method. The basic difference between these methods lies in the way by which the mutually conjugate directions d0, d1, . . . are generated. For the Lanczos method, we start with the initializations: d0 := b, , , x0 = a0d0. Then, for i = 1, 2, . . . , we repeat the steps in Algorithm 4.9 as long as .

Algorithm 4.9. An iteration in the Lanczos method

vi+1 := Adi.

.

.

xi := xi–1 + aidi.

If A is a real positive definite matrix, the termination criterion is equivalent to the condition di = 0. When this is satisfied, the vector xi–1 equals the desired solution x of the system Ax = b. Since d0, d1, . . . are mutually orthogonal, the process must stop after at most n – 1 iterations. Therefore, for a sparse matrix A, the entire procedure performs O~(n2) field operations.

The problems we face with the Lanczos method applied to a system over are essentially the same as those discussed in connection with the conjugate gradient method. The problem with a non-symmetric and/or non-square matrix A is solved by multiplying the system by At. Instead of working with AtA explicitly, we prefer to multiply separately by A and At.

The more serious problem with a system over is that of encountering a non-zero direction vector di with . If it happens, we have to abort the computation prematurely. In order to restart the procedure, we try to solve the system BAx = Bb, where B is a diagonal matrix whose diagonal elements are chosen randomly from the non-zero elements of the field or of some suitable extension (if q is small).

4.7.4. The Wiedemann Method

The Wiedemann method for solving a sparse system Ax = b over uses ideas different from those employed by the other methods discussed so far. For the sake of simplicity, we assume that A is a square non-singular matrix (not necessarily symmetric). The Wiedemann method tries to compute the minimal polynomial , dn, of A. To that end, one selects a small positive integer l in the range 10 ≤ l ≤ 20. For , let vi denote the column vector of length l consisting of the first l entries of the vector Aib. For the working of the Wiedemann method, we need to compute only the vectors v0, . . . , v2n. If A is a sparse matrix, this computation involves a total of O~(n2) operations in .

Since μA(A) = 0, we have for every . Therefore, for each k = 1, . . . , l the sequence v0,k, v1,k, . . . of the k-th entries of v0, v1, . . . satisfies the linear recurrence

But then the minimal polynomial μk(X) of the k-th such sequence is a factor of μA(X). There are methods that compute each μk(X) using O(n2) field operations. We then expect to obtain μA(X) = lcm(μk(X) | 1 ≤ kl).

The assumption that A is non-singular is equivalent to the condition that c0 ≠ 0. In that case, the solution vector can be computed using O~(n2) arithmetic operations in the field .

If A is singular, we may find out linear dependencies among the rows of A and subsequently throw away suitable rows. Doing this repeatedly eventually gives us a non-singular A. For further details on the Wiedemann method, see [303].

4.8. The Subset Sum Problem

In this section, we assume that be a knapsack set. For , we are required to find out such that , provided that a solution exists. In general, finding such a solution for ∊1, . . . , ∊n is a very difficult problem.[6] However, if the weights satisfy some specific bounds, there exist polynomial-time algorithms for solving the SSP.

[6] In the language of complexity theory, the decision problem of determining whether a solution of the SSP exists is NP-complete.

Let us first define an important quantity associated with a knapsack set:

Definition 4.5.

The density of the knapsack set is defined to be the real number .

If d(A) > 1, then there are, in general, more than one solutions for the SSP (provided that there exists one solution). This makes the corresponding knapsack set A unsuitable for cryptographic purposes. So we consider low densities: that is, the case that d(A) ≤ 1.

There are certain algorithms that reduce in polynomial time the problem of finding a solution of the SSP to that of finding a shortest (non-zero) vector in a lattice. Assuming that such a vector is computable in polynomial time, Lagarias and Odlyzko’s reduction algorithm [157] solves the SSP in polynomial time with high probability, if d(A) ≤ 0.6463. An improved version of the algorithm adapts to densities d(A) ≤ 0.9408 (see Coster et al. [64] and Coster et al. [65]). The reduction algorithm is easy and will be described in Section 4.8.1. However, it is not known how to efficiently compute a shortest non-zero vector in a lattice. The Lenstra–Lenstra–Lovasz (L3) polynomial-time lattice basis reduction algorithm [166] provably finds out a non-zero vector whose length is at most the length of a shortest non-zero vector, multiplied by a power of 2. In practice, however, the L3 algorithm tends to compute a shortest vector quite often. Section 4.8.2 deals with the L3 lattice basis reduction algorithm.

Before providing a treatment on lattices, let us introduce a particular case of the SSP, which is easily (and uniquely) solvable.

Definition 4.6.

A knapsack set {a1, . . . , an} with a1 < ··· < an is said to be superincreasing, if for all j = 2, . . . , n.

Algorithm 4.10 solves the SSP for a superincreasing knapsack set in deterministic polynomial time. The proof for the correctness of this algorithm is easy and left to the reader.

Algorithm 4.10. Solving the superincreasing knapsack problem

Input: A superincreasing knapsack set {a1, . . . , an} with a1 < ··· < an and .

Output: The (unique) solution for of , if it exists, failure, otherwise.

Steps:

for i = nn – 1, . . . , 1 {
   if (s ≥ ai) { ∊i := 1, s := s – ai. } else { ∊i := 0. }
}
if (s = 0) { Return (∊1, . . . , ∊n). } else { Return “failure”. }

4.8.1. The Low-Density Subset Sum Problem

We start by defining a lattice.

Definition 4.7.

Let n, , dn, and let be d linearly independent (non-zero) vectors (that is, n-tuples). The lattice L of dimension d spanned by v1, . . . , vd is the set of all -linear combinations of v1, . . . , vd, that is,

We say that v1, . . . , vd constitute a basis of L.

In general, a lattice may have more than one basis. We are interested in bases consisting of short vectors, where the concept of shortness is with respect to the following definition.

Definition 4.8.

Let v := (v1, . . . , vn)t and w := (w1, . . . , wn)t be two n-dimensional vectors in . The inner product of v and w is defined to be the non-negative real number

v, w〉 := v1w1 + ··· + vnwn,

and the length of v is defined as

For the time being, let us assume the availability of a lattice oracle which, given a lattice, returns a shortest non-zero vector in the lattice. The possibilities for realizing such an oracle will be discussed in the next section.

Consider the subset sum problem with the knapsack set A := {a1, . . . , an} and let B be an upper bound on the weights (that is, each aiB). For , we are supposed to find out such that . Let L be the n+1-dimensional lattice in generated by the vectors

where N is an integer larger than . The vector is in the lattice L, where . Involved calculations (carried out in Coster et al. [64, 65]) show that the probability P of the existence of a vector with ‖w‖ ≤ ‖v‖ satisfies , where c ≈ 1.0628. Now, if the density d(A) of A is less than 1/c ≈ 0.9408, then B = 2cn for some c′ > c and, therefore, P → 0 as n → ∞. In other words, if d(A) < 0.9408, then, with a high probability, ±v are the shortest non-zero vectors of L. The lattice oracle then returns such a vector from which the solution ∊1, . . . , ∊n can be readily computed.

4.8.2. The Lattice-Basis Reduction Algorithm

Let L be a lattice in specified by a basis of n linearly independent vectors v1, . . . , vn. We now construct a basis of such that (that is, and are orthogonal to each other) for all i, j, ij. Note that need not be a basis for L. Algorithm 4.11 is known as the Gram–Schmidt orthogonalization procedure.

Algorithm 4.11. Gram–Schmidt orthogonalization

Input: A basis v1, . . . , vn of

Output: The Gram–Schmidt orthogonalization of v1, . . . , vn.

Steps:

.
for i = 2, . . . , n {
   where .
}

One can easily verify that constitute an orthogonal basis of . Using these notations, we introduce the following important concept:

Definition 4.9.

The basis v1, . . . , vn is called a reduced basis of L, if

Equation 4.16


and

Equation 4.17


A reduced basis v1, . . . , vn of L is termed so, because the vectors vi are somewhat short. More precisely, we have Theorem 4.5, the proof of which is not difficult, but is involved, and is omitted here.

Theorem 4.5.

Let v1, . . . , vn be a reduced basis of a lattice L, and . For any m linearly independent vectors w1, . . . , wm of L, we have

for all i = 1, . . . , m. In particular, for any non-zero vector w of L we have

v12 ≤ 2n–1w2.

That is, for a reduced basis v1, . . . , vn of L the length of v1 is at most 2(n–1)/2 times that of the shortest non-zero vector in L.

Given an arbitrary basis v1, . . . , vn of a lattice L, the L3 basis reduction algorithm computes a reduced basis of L. The algorithm starts by computing the Gram–Schmidt orthogonalization of v1, . . . , vn. The rational numbers μi,j are also available from this step. We also obtain as byproducts the numbers for i = 1, . . . , n.

Algorithm 4.12 enforces Condition (4.16) |μk,l| ≤ 1/2 for a given pair of indices k and l. The essential work done by this routine is subtracting a suitable multiple of vl from vk and updating the values μk,1, . . . , μk,l accordingly.

Algorithm 4.12. Subroutine for basis reduction

Input: Two indices k and l.

Output: An update of the basis vectors to ensure |μk,l| ≤ 1/2.

Steps:

vk := vkrvl.

for h = 1, . . . , l – 1 {μk,h := μk,hrμl,h. }

μk,l := μk,lr.

If Condition (4.17) is not satisfied by some k, that is, if , then vk and vk–1 are swapped. The necessary changes in the values Vk, Vk–1 and certain μi,j’s should also be incorporated. This is explained in Algorithm 4.13.

Algorithm 4.13. Subroutine for basis reduction

Input: An index k.

Output: An update of the basis vectors to ensure .

Steps:

μ := μk,k–1.   V := Vk + μ2Vk–1.
μk,k–1 := μVk–1/V.   Vk := Vk–1Vk/V.   Vk–1 := V.
Swap (vkvk–1).
for h = 1, . . . , k – 2 { Swap (μk,hμk,h–1). }
for h = k + 1, . . . , n {
   μ′ := μh,k–1 – μμh,k.   μh,k–1 := μh,k + μk,k–1μ′.   μh,k := μ′.
}

The main basis reduction algorithm is described in Algorithm 4.14. It is not obvious that this algorithm should terminate at all. Consider the quantity D := d1 · · · dn–1, where di := | det(〈vk, vl〉)1≤k,li| for each i = 1, . . . , n. At the beginning of the basis reduction procedure one has diBi for all i = 1, . . . , n, where B := max(|vi|2 | 1 ≤ in). It can be shown that an invocation of Algorithm 4.12 does not alter the value of D, whereas interchanging vi and vi–1 in Algorithm 4.13 decreases D by a factor < 3/4. It can also be shown that for any basis of L the value D is bounded from below by a constant which depends only on the lattice. Thus, Algorithm 4.14 stops after finitely many steps.

Algorithm 4.14. Basis reduction in a lattice

Input: A basis v1, . . . , vn of a lattice L.

Output: v1, . . . , vn converted to a reduced basis.

Steps:

Compute the Gram–Schmidt orthogonalization of v1, . . . , vn (Algorithm 4.11).

/* The initial values of μi,j and Vi are available at this point */
i := 2.
while (i < n) {
   if (|μi,i–1| > 1/2) { Call Algorithm 4.12 with k = i and l = i – 1. }
   if 
      Call Algorithm 4.13 with k = i.
      i := max(2, i – 1).
   }
   for j = i – 2, i – 3, . . . , 1 {
      if (|μi,j| > 1/2) { Call Algorithm 4.12 with k = i and l = j. }
   }
   i++.
}

For a more complete treatment of the L3 basis reduction algorithm, we refer the reader to Lenstra et al. [166] (or Mignotte [203]). It is important to note here that the L3 basis reduction algorithm is at the heart of the Lenstra–Lenstra–Lovasz algorithm for factoring a polynomial in . This factoring algorithm indeed runs in time polynomially bounded by the degree of the polynomial to be factored and is one of the major breakthroughs in the history of symbolic computing.

Exercise Set 4.8

4.27Let be a knapsack set. Show that:
  1. If A is superincreasing with a1 < ··· < an, then ai ≥ 2i–1 for all i = 1, . . . , n and hence .

  2. If , then there exist two different tuples (∊1, . . . , ∊n) and in {0, 1}n such that .

4.28Let L be a lattice in and let v1, . . . , vn constitute a basis of L. The determinant of L is defined by

det L := det(v1, . . . , vn).

  1. Show that det L is an invariant of the lattice L (that is, independent of the basis v1, . . . , vn of L).

    Let be the Gram–Schmidt orthogonalization of the basis v1, . . . , vn.

  2. Show that det .

  3. Prove the Hadamard inequality: det L ≤ ‖v1‖ · · · ‖vn‖.

Chapter Summary

This chapter introduces the most common computationally intractable mathematical problems on which the security of public-key cryptosystems banks. We also describe some algorithms known till date for solving these difficult computational problems.

To start with, we enumerate these computational problems. The first problem in the row is the integer factorization problem (IFP) and its several variants. Some problems that are provably or believably equivalent to the IFP are the totient problem, problems associated with the RSA algorithm, and the modular square root problem. The next class of problems includes the discrete logarithm problem (DLP) and its variants on elliptic curves (ECDLP) and hyperelliptic curves (HECDLP). The Diffie–Hellman problem (DHP) and its variants (ECDHP, HECDHP) are believed to be equivalent to the respective variants of the DLP. Finally, the subset sum problem (SSP) and two related problems, namely the shortest vector problem (SVP) and the closest vector problem (CVP) on lattices, are introduced.

The subsequent sections are devoted to an algorithmic study of these difficult problems. We start with IFP. We first present some fully exponential algorithms like trial division, Pollard’s rho method, Pollard’s p – 1 method and Williams’ p + 1 method. Next we describe the modern genre of subexponential algorithms. The quadratic sieve method (QSM) is discussed at length together with its heuristic improvements like incomplete sieving, large prime variation and the multiple polynomial variant. We also describe TWINKLE, a hardware device that efficiently implements the sieving stage of the QSM. We then discuss the elliptic curve method (ECM) and the number field sieve method (NFSM) for factoring integers. The NFSM turns out to be the asymptotically fastest known algorithm for factoring integers.

The (finite field) DLP is discussed next. The older square-root methods, such as Shanks’ baby-step–giant-step method (BSGS), Pollard’s rho method and the Pohlig–Hellman method (PHM), take exponential running times in the worst case. The PHM for a field is, however, efficient if q – 1 has only small prime factors. Next we discuss the modern family of algorithms collectively known as the index calculus method (ICM). For prime fields, we discuss three variants of the ICM, namely the basic method, the linear sieve method (LSM) and the number field sieve method (NFSM). We also discuss three variants of the ICM for fields of characteristic 2: the basic method, the linear sieve method and Coppersmith’s algorithm. Another interesting variant is the cubic sieve method (CSM) covered in the exercises. We explain Gordon and McCurley’s polynomial sieving in connection with Coppersmith’s algorithm.

The next section deals with algorithms for solving the ECDLP. For a general elliptic curve, the exponential square-root methods are the only known algorithms. For some special classes of curves, more efficient methods are proposed in the literature. The MOV reduction based on Weil pairing reduces ECDLP on a curve over to DLP in the finite field for some suitable . This k is small and the reduction is efficient for supersingular curves. The SmartASS method (also called the anomalous method) reduces the ECDLP in an anomalous curve to the computation of p-adic discrete logarithms. This reduction solves the original DLP in polynomial time. In view of these algorithms, it is preferable to avoid supersingular and anomalous curves in cryptographic applications. The xedni calculus method (XCM) is discussed finally. This algorithm works by lifting a curve over to a curve over . Experimental and theoretical evidences suggest that the XCM is not an efficient solution to the ECDLP.

We then devote a section to the study of an index calculus method to solve the HECDLP. For hyperelliptic curves of small genus, this method leads to a subexponential algorithm (the ADH–Gaudry algorithm).

Many of the above subexponential methods require solving a system of linear congruences over finite rings. This (inherently sequential) linear algebra part often turns out to be the bottleneck of the algorithms. However, the fact that these equations are necessarily sparse can be effectively exploited, and some faster algorithms can be used to solve these systems. We study four such algorithms: structured Gaussian elimination, the conjugate gradient method, the Lanczos method and the Wiedemann method.

In the last section, we study the subset sum problem. We first reduce the SSP to problems associated with lattices. We finally present the lattice-basis reduction algorithm due to Lenstra, Lenstra and Lovasz.

Several other computationally intractable problems have been proposed in the literature for building cryptographic systems. Some of these problems are mentioned in the annotated references of Chapter 5. Due to space and time limitations, we will not discuss these problems in this book.

Suggestions for Further Reading

The integer factorization problem is one of the oldest computational problems. Though the exact notion of computational complexity took shape only after the advent of computers, the apparent difficulty of solving the factorization problem has been noticed centuries ago. Crandall and Pomerance [69] call it the fundamental computational problem of arithmetic. Numerous books and articles provide discussions on this subject at varying levels of coverage. Crandall and Pomerance [69] is perhaps the most extensive in this regard. The reader can also take a look at Bressoud’s (much simpler) book [36] or the (compact, yet reasonably detailed) Chapter 10 of Henri Cohen’s book [56]. The articles by Lenstra et al. [164] and by Montgomery [211] are also worth reading.

John M. Pollard has his name attached to three modern inventions in the arena of integer factorization. In [238, 239], he introduces the rho and p – 1 methods. (Later he has been part of the team that has designed the number-field sieve factoring algorithm.) Williams’ p + 1-method appears in 1982 in [305].

The continued fraction method (CFRAC) is apparently the first known subexponential-time integer factoring algorithm. It is based on the work of Lehmer and Powers [162] and first appears in the currently used form in Morrison and Brillhart’s paper [213]. CFRAC happens to be the most widely used integer factoring algorithm used during late 1970s and early 1980s.

The quadratic sieve method, invented by Carl Pomerance [241] in 1984, supersedes the CFRAC method. The multiple-polynomial QSM appears in Silverman [279]. Hendrik Lenstra’s elliptic curve method [174] is proposed almost concurrently as the QSM. Nowadays, the QSM and the ECM are the most commonly used factoring methods. Reyneri’s cubic sieve method is described in Lenstra and Lenstra [165].

The theoretically superior number field sieve method follows from Pollard’s factoring method using cubic integers [240]. The initial proposal for the NFS method is that of the simple NFS and appears in Lenstra et al. [167]. It is later modified to the general NFS method in Buhler et al. [41]. Lenstra and Lenstra [165] is a compilation of papers on the NFS method. Though the NFS method is the asymptotically fastest factoring method, its fairly complicated implementation makes the algorithm superior to QSM or ECM, only when the bit size of the integer to be factored is reasonably large.

Shamir’s factoring engine TWINKLE is proposed in [269]. A. K. Lenstra and Shamir analyse and optimize its design in [168]. Shamir and Tromer [270] have proposed a device called TWIRL (The Weizmann Institute Relation Locator) that is geared to the NFS factoring method. It is estimated that a TWIRL implementation costing US$10K can complete the sieving for a 512-bit RSA modulus in less than 10 minutes, whereas one that does the same for a 1024-bit RSA modulus costs US$10–50M and takes a time of one year. Lenstra et al. [163] provide a more detailed analysis of these estimates. See Lenstra et al. [169] to know about Bernstein’s factorization circuit which is another implementation of the NFS factoring method.

The (finite field) discrete logarithm problem also invoked much research in the last few decades. The older square-root methods are described well in the book [191] by Menezes. Donald Knuth attributes the baby-step–giant-step method to Daniel Shanks. See Stein and Teske [290] for various optimizations of the baby-step–giant-step method. Pollard’s rho method is an adaptation of the same method for integer factorization. See Pohlig and Hellman [234] for the Pohlig–Hellman method.

The first idea of the index calculus method appears in Western and Miller [302]. Coppersmith et al. [59] describe three variants of the index calculus method: the linear sieve method, the residue list sieve method and the Gaussian integer method. The same paper also proposes the cubic sieve method (CSM). LaMacchia and Odlyzko [158] describe an implementation of the linear sieve and the Gaussian integer methods. Das and Veni Madhavan [73] make an implementation study of the CSM. Also look at the survey [189] by McCurley.

Gordon [119] uses number field sieves for computing discrete logarithms over prime fields. Weber et al. [261, 299, 300, 301] have implemented and proved the practicality of the number field sieve method. Also see Schirokauer’s paper [260].

Odlyzko [225] surveys the algorithms for computing discrete logs in the fields . The best algorithm for these fields is Coppersmith’s algorithm [57]. No analog of this algorithm is known for prime fields. Gordon and McCurley [120] use Coppersmith’s algorithm for the computation of discrete logarithms in and .

The article [226] by Odlyzko and the one [242] by Pomerance are two recent surveys on the finite field discrete logarithm problem. Also see Buchmann and Weber [40].

The elliptic curve discrete logarithm problem seems to be a very difficult computational problem. A direct adaptation of the index calculus method is expected to lead to a running time worse than that of brute-force search (Silverman and Suzuki [278] and Blake et al. [24].) Menezes et al. [193] reduce the problem of computing discrete logs in an elliptic curve over to computing discrete logs in the field for some k. For supersingular elliptic curves, this k can be chosen to be small. For a general curve, the MOV reduction takes exponential time (Balasubramanian and Koblitz [16]). The SmartASS method is due to Smart [282], Satoh and Araki [257] and Semaev [265]. Joseph H. Silverman proposes the xedni calculus method in [277]. This method has been experimentally and heuristically shown to be impractical by Jacobson et al. [139].

Adleman et al. [2] propose the first subexponential algorithm for the hyperelliptic curve discrete log problem. This algorithm is applicable for curves of high genus over prime fields. The analysis of its running time is based on certain heuristic assumptions. Enge [86] provides a subexponential algorithm which has a rigorously provable running time and which works for curves over any arbitrary field . Again, the algorithm demands curves of high genus. An implementation of the Adleman–DeMarrais–Huang algorithm is given by Gaudry [105]. Also see Enge and Gaudry [87].

Gaudry et al. [107] propose a Weil-descent attack for the hyperelliptic curve discrete log problem. This is modified in Galbraith [100] and Galbraith et al. [101].

Coppersmith et al. [59] describe sparse system solvers. LaMacchia and Odlyzko [159] implement these methods. For further details, see Montgomery [212], Coppersmith [58], Wiedemann [303], and Yang and Brent [306].

That public-key cryptosystems can be based on the subset-sum problem (or the knapsack problem) was considered at the beginning of the era of public-key cryptography. Historically the first realization of a public-key system is based along this line and is due to Merkle and Hellman [196]. But the Merkle–Hellman system and several variants of it are broken; see Shamir [266], for example. At present, most public-key systems based on the subset-sum problem are known to be insecure.

The lattice-basis reduction algorithm and the associated L3 algorithm for factoring polynomials appear in the celebrated work [166] of Lenstra, Lenstra and Lovasz. Mignotte’s book [203] also describes these topics in good details.

5. Cryptographic Algorithms

5.1Introduction
5.2Secure Transmission of Messages
5.3Key Exchange
5.4Digital Signatures
5.5Entity Authentication
 Chapter Summary
 Sugestions for Further Reading

An essential element of freedom is the right to privacy, a right that cannot be expected to stand against an unremitting technological attack.

—Whitfield Diffie

Mary had a little key (It’s all she could export), and all the email that she sent was opened at the Fort.

—Ronald L. Rivest

Treat your password like your toothbrush. Don’t let anybody else use it, and get a new one every six months.

—Clifford Stoll

5.1. Introduction

As we pointed out in Chapter 1, cryptography tends to guard sensitive data from unauthorized access. We shortly describe some algorithms that achieve this goal. We restrict ourselves only to public-key algorithms. In practice, however, public-key algorithms are used in tandem with secret-key algorithms. In this chapter, we describe only the basic routines to which are input mathematical entities like integers, points in finite fields or on curves. Message encoding will be dealt with in Chapter 6.

5.2. Secure Transmission of Messages

Consider the standard scenario: a party named Alice, and called sender, is willing to send a secret message m to a party named Bob, and called receiver or recipient, over a public communication channel. A third party Carol may intercept and read the message. In order to maintain the secrecy of the message, Alice uses a well-defined transform fe to convert the plaintext message m to the ciphertext message c and sends c to Bob. Bob possesses some secret information with the help of which he uses the reverse transformation fd in order to get back m. Carol who is expected not to know the secret information cannot retrieve m from c by applying the transformation fd.

In a public-key system, the realization of the transforms fe and fd is based on a key pair (e, d) predetermined by Bob. The public key e is made public, whereas the private key d is kept secret. The encryption transform generates c = fe(m, e). Since e is a public knowledge, anybody can generate c from a given m, whereas the decryption transform m = fd(c, d) can be performed only by Bob who possesses the knowledge of d. The key pair has to be so chosen that knowledge of e does not allow Carol to compute d in feasible time. The intractability of the computational problems discussed in Chapter 4 can be exploited to design such key pairs. The exact realization of the keys e, d and the transforms fe, fd depends on the choice of the underlying intractable problem and also on the way to make use of the problem. Since there are several intractable problems suitable for cryptography, there are several encryption schemes varying widely in algorithmic and mathematical details.

5.2.1. The RSA Public-key Encryption Algorithm

RSA has been the most popular encryption algorithm. Historically also, it is the first public-key encryption algorithm published in the literature (see Rivest et al. [252]). Its security is based on the intractability of the RSAP (or the RSAKIP) discussed in Exercise 4.2. Since both these problems are polynomial-time reducible to the IFP, we often say that the RSA algorithm derives its security from the intractability of the IFP. It may, however, be the case that breaking RSA is easier than factoring integers, though no concrete evidences seem to be available.

RSA key pair

Algorithm 5.1 generates a key pair for RSA.

Algorithm 5.1. RSA key generation

Input: A bit length l.

Output: A random RSA key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Choose an integer e coprime to φ(n) = (p – 1)(q – 1).

d := e–1 (mod φ(n)).

Return the pair (n, e) as the public key and the pair (n, d) as the private key.

The length l of the primes p and q should be chosen large enough so as to make the factorization of n infeasible. For short-term security, values of l between 256 and 512 suffice. For long-term security, one may choose l as large as 2,048.

The random primes p and q can be generated using a probabilistic algorithm like those described in Section 3.4.2. Naive primes are normally considered to be sufficiently secure in this respect, since p ± 1 and q ± 1 are expected to have large prime factors in general. Gordon’s algorithm (Algorithm 3.14) can also be used for generating strong primes p and q. Since Gordon’s algorithm runs only nominally slower than the algorithm for generating naive primes, there is no harm in using strong primes. Safe primes, on the other hand, are difficult to generate and may be avoided.

The RSA modulus n is public knowledge. Determining d from n and e is easily doable, given the value of φ(n) = (p – 1)(q – 1) which, in turn, is readily computable, if p and q are known. If an adversary can compute φ(n) (with or without factoring n), the security of the RSA protocol based on the modulus n is compromised. However, computing φ(n) without the knowledge of p and q is (at least historically) a very difficult computational problem, and so, if n is reasonably large, RSA encryption is assumed to be sufficiently secure.

RSA encryption is done by raising the plaintext message m to the power e modulo n. In order to speed up this (modular) exponentiation, it is often expedient to take a small value for e (like 3, 257 and 65,537). However, in that case one should adopt certain precautions as Exercise 5.2 suggests. More specifically, if e entities share a common (small) encryption key e but different (pairwise coprime) moduli and if the same message m is encrypted using all these public keys, then an eavesdropper can reconstruct m easily from a knowledge of the e ciphertext messages. Another potential problem of using small e is that if m is small, that is, if m < n1/e, then m can be retrieved by taking the integer e-th root of the ciphertext message.

Although the pair (n, d) is sufficient for carrying out RSA decryption, maintaining some additional (secret) information significantly speeds up decryption. To this end, it is often recommended that some or all of the values n, e, d, p, q, d1, d2, h be stored, where d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p).

If n can be factored, then d can be easily computed from the public key (n, e). Conversely, if n, e, d are all known, there is an efficient probabilistic algorithm which factors n. This algorithm is based on the fact that if ed – 1 = 2st with t odd, then for at least half of the integers there exists such that a2σt ≢ ±1 (mod n), whereas a2σ+1t ≡ 1 (mod n). But then the gcd of n and a2σt – 1 is a non-trivial factor of n. For the details, solve Exercise 7.9.

Different entities in a given network should use different values of n. If two or more entities share a common n but different exponent pairs (ei, di), then each entity can first factor n and then use this factorization to compute the private keys of other entities. Primes are quite abundant in nature and so finding pairwise coprime RSA moduli for all entities is no problem at all. A common value of the encryption exponent e (for example, a small value of e) can, however, be shared by all entities. In that case, for pairwise different moduli ni, the corresponding decryption exponents di will also be pairwise different.

RSA encryption

RSA encryption is rather simple, as Algorithm 5.2 shows.

Algorithm 5.2. RSA encryption

Input: The RSA public key (n, e) of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := me (mod n).

By Exercise 4.1, the exponentiation function mme is bijective; so m can be uniquely recovered from c. It is clear why small encryption exponents e speed up RSA encryption. For a general exponent e, the routine takes time O(log3 n), whereas for a small e (that is, e = O(1)) the running time drops to O(log2 n).

RSA decryption

RSA decryption (Algorithm 5.3) is analogous to RSA encryption.

Algorithm 5.3. RSA decryption

Input: The RSA private key (n, d) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m := cd (mod n).

The correctness of this decryption procedure follows from Exercise 4.1. As in the case of encryption, one might go for small decryption exponents d. In general, both e and d cannot be small simultaneously. If e is small, the security of the RSA scheme is expected not be affected, whereas small values of d are not desirable for several reasons. First, if d is very small, the adversary chooses some m, computes the corresponding ciphertext c (using public knowledge) and then keeps on computing cx (mod n) for x = 1, 2, . . . until x = d is reached, that is, until the original message m is recovered.

Even when d is not very small so that the possibility of exhaustive search with x = 1, 2, . . . can be precluded, there are several attacks known for small private exponents. Wiener [304] proposes an efficient algorithm in this respect. Boneh and Durfee [32] improve Wiener’s algorithm. Sun et al. [294] propose three variants of the RSA scheme that are resistant to these attacks. Durfee and Nguyen [82] extend the Boneh–Durfee attack to break two of these three variants. To sum up, it is advisable not to use small secret exponents d, that is, the bit length of d should be close to that of n in order to achieve the desired level of security.

There are alternative ways to speed up RSA decryption. If the values p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are all available to the recipient, he can use Algorithm 5.4 for RSA decryption.

Algorithm 5.4. RSA decryption using CRT

Input: The RSA extended private key (p, q, d1, d2, h) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m1 := cd1 (mod p).

m2 := cd2 (mod q).

t := h(m1m2) (mod p).

m := m2 + tq.

In this modified routine, m1 := m rem p and m2 := m rem q are first computed and then combined using the CRT to get m modulo n = pq. Algorithm 5.3 performs a single modular exponentiation modulo n, whereas in Algorithm 5.4 two exponentiations modulo p and q respectively take the major portion of the running time. Since an exponentiation modulo N to an exponent O(N) runs in time O(log3 N), and since each of p and q has bit length (about) half of that of n, Algorithm 5.4 runs about four times as fast as Algorithm 5.3.

If only the values p, q, d are stored, then d1, d2 and h can be computed on the fly using relatively inexpensive operations and subsequently Algorithm 5.4 can be used. This leads to a decryption routine almost as fast as Algorithm 5.4, but calls for somewhat smaller memory requirements for the storage of the private key.

5.2.2. The Rabin Public-key Encryption Algorithm

The Rabin public-key encryption algorithm is based on the intractability of computing square roots modulo a composite integer (SQRTP). By Exercise 4.10, the SQRTP is probabilistically polynomial-time equivalent to the IFP, that is, breaking the Rabin scheme is provably as hard as factoring integers. Breaking RSA, on the other hand, is only believed to be equivalent to factoring integers. Moreover, Rabin encryption is faster than RSA encryption (for moduli of the same size).

Rabin key pair

Like RSA, Rabin encryption requires a modulus of the form n = pq.

Algorithm 5.5. Rabin key generation

Input: A bit length l.

Output: A random Rabin key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Here, the choice of the bit length l and the generation of the primes p and q follow the same guidelines as discussed in connection with RSA key generation.

Rabin encryption

Encryption in the Rabin scheme involves a single modular squaring.

Algorithm 5.6. Rabin encryption

Input: The Rabin public key n of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := m2 (mod n).

Unfortunately, the Rabin encryption map mm2 (mod n) is not injective. In general, a ciphertext c has four square roots modulo n.[1] This poses ambiguity during decryption. In order to work around this difficulty, one adds some distinguishing feature or redundancy to the message m before encryption. One possibility is to duplicate a predetermined number of bits at the least significant end of m. This reduces the message space somewhat, but is rarely a serious issue. Only one of the (four) square roots of the ciphertext c is expected to have the desired redundancy. If none or more than one square root possesses the redundancy, decryption fails. However, this is a very rare phenomenon and can be ignored for all practical purposes.

[1] More specifically, if an element is a square modulo both p and q, then the number of square roots of c equals 1 if c = 0; it is 2 if either c ≡ 0 (mod p) or c ≡ 0 (mod q) but not both; and it is 4 if c ≢ 0 (mod p) and c ≢ 0 (mod q). If c is not a square modulo either p or q, then c does not possess a square root modulo n. These assertions can be readily proved using the Chinese remainder theorem.

Rabin decryption

Rabin decryption (Algorithm 5.7) involves computing square roots modulo n. Since n is composite, this is a very difficult problem (for the eavesdropper). But the knowledge of the prime factors p and q of n allows the recipient to decrypt.

Algorithm 5.7. Rabin decryption

Input: The Rabin private key (p, q) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

if or ( { Returnc is not a ciphertext message”. }

Compute the square roots of c mod p./* Algorithm 3.17 */
Compute the square roots of c mod q./* Algorithm 3.17 */
Compute the square roots of c mod n from those mod p and q./* Use CRT */

if (c has exactly one distinguished square root m mod n) { Return m. }

else { Return “failure”. }

5.2.3. The Goldwasser–Micali Encryption Algorithm

So far, we have encountered encryption algorithms that are deterministic in the sense that for a given public key of the recipient the same plaintext message encrypts to the same ciphertext message. In a probabilistic encryption algorithm, different calls of the encryption routine produce different ciphertext messages for the same plaintext message and public key.

The Goldwasser–Micali encryption algorithm is probabilistic and is based on the intractability of the quadratic residuosity problem (QRP) described in Exercise 4.2. If n is a composite integer and a an integer coprime to n, then implies that a is a quadratic non-residue modulo n. The converse does not hold, that is, one may have , even when a is a quadratic non-residue modulo n. For example, if n is the product of two distinct odd primes p and q, then a is a quadratic residue modulo n if and only if a is a quadratic residue modulo both p and q. However, if , we continue to have . There is no easy way to find out if a is a quadratic residue modulo n for an integer a with . If the factorization of n is available, the QRP is solvable in polynomial time. These observations lead to the design of the Goldwasser–Micali scheme.

Goldwasser–Micali key pair

The Goldwasser–Micali scheme works in the ring , where n is the product of two distinct sufficiently large primes. The integer a (resp. b) in Algorithm 5.8 can be found by randomly choosing elements of (resp. ) and computing the Legendre symbol (resp. ). Under the assumption that quadratic non-residues are randomly located in and , a and b can be found after only a few trials. The integer x is a quadratic non-residue modulo n with .

Goldwasser–Micali encryption

Goldwasser–Micali encryption (Algorithm 5.9) is probabilistic, since its output is dependent on a sequence of random elements ai of . It generates a tuple (c1, . . . , cr) of elements of such that each . If mi = 0, then ci is a quadratic residue modulo n, whereas if mi = 1, ci is a quadratic non-residue modulo n. Therefore, if the quadratic residuosity of ci modulo n can be computed, the bit mi can be determined. If one (for example, the recipient) knows the factorization of n or equivalently the prime factor p of n, one can perform decryption easily. An eavesdropper, on the other hand, must solve the QRP (or the IFP) in order to find out the bits m1, . . . , mr. This is how Goldwasser–Micali encryption derives its security.

Algorithm 5.8. Goldwasser–Micali key generation

Input: A bit length l.

Output: A random Goldwasser–Micali key pair.

Steps:

Generate two (different) random primes p and q each of bit length l.

n := pq.

Find out integers a and b such that .

Compute an integer x with x ≡ a (mod pand x ≡ b (mod q).          /* Use CRT */

Return the pair (n, x) as the public key and the prime p as the private key.

Algorithm 5.9. Goldwasser—Micali encryption

Input: The Goldwasser—Micali public key (n, x) of the recipient and the plaintext message m = m1 . . . mr, , which is a bit string of length r.

Output: The ciphertext message .

Steps:

for i = 1, . . . , r {
   Select a random element .
   .
}

Since randomly chosen non-zero elements of are with high probability coprime to n, it is sufficient to draw ai from \{0} and skip the check whether gcd(ai, n) = 1. In fact, if an ai with gcd(ai, n) > 1 is somehow located, this gcd equals a non-trivial factor of n, and the security of the scheme is broken.

The Goldwasser–Micali scheme has the drawback that the length of the ciphertext message is much bigger than that of the plaintext message. Thus, for example, for a 1024-bit modulus n and a message m of bit length 64, the output requires a huge 65,536-bit space. This phenomenon is called message expansion and can be a serious limitation in certain circumstances.

Goldwasser–Micali decryption

Goldwasser–Micali decryption (Algorithm 5.10) recovers the bits of the plaintext message by computing Legendre symbols modulo the prime divisor p of n. The correctness of this decryption algorithm is evident from the discussion immediately following Algorithm 5.9.

Algorithm 5.10. Goldwasser—Micali decryption

Input: The Goldwasser—Micali private key p of the recipient and the ciphertext message .

Output: The recovered plaintext message m = m1, . . . , mr, .

Steps:

for i = 1, . . . , r {
   if  else { mi :=1 }
}

5.2.4. The Blum–Goldwasser Encryption Algorithm

The Blum–Goldwasser algorithm is another probabilistic encryption algorithm and is better than the Goldwasser–Micali algorithm in the sense that in this case the message expansion is by only a constant number of bits irrespective of the length of the plaintext message. The Blum–Goldwasser scheme is based on the intractability of the SQRTP (modulo a composite integer).

Blum–Goldwasser key pair

As in the case of the encryption algorithms discussed so far, the Blum–Goldwasser algorithm works in the ring , where n = pq is the product of two distinct primes p and q. Now, we additionally demand p and q to be both congruent to 3 modulo 4.

Algorithm 5.11. Blum–Goldwasser key generation

Input: A bit length l.

Output: A random Blum–Goldwasser key pair.

Steps:

Generate two (different) random primes p and q each of bit length l and each congruent to 3 mod 4.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Since p and q are two different primes, there exist integers u and v such that up + vq = 1. In order to speed up decryption, it is often expedient to store u and v along with p and q in the private key. Recall that the solution of the congruences xa (mod p) and xb (mod q) is given by xvqa + upb (mod n).

Blum–Goldwasser encryption

The Blum–Goldwasser encryption algorithm assumes that the input plaintext message m is in the form of a bit string, and breaks m into substrings of a fixed length t. A typical choice for t is t = ⌊lg lg n⌋, where n is the public key of the recipient. Write m = m1 . . . mr, where each mi is a bit string of length t. The ciphertext consists of r bit strings c1, . . . , cr, each of bit length t, and an element .

Algorithm 5.12. Blum–Goldwasser encryption

Input: The Blum–Goldwasser public key n of the recipient and the plaintext message m = m1 . . .mr, where each mi is a bit string of length t.

Output: The ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and .

Steps:

Choose a random element .

d := d2 (mod n).
for i = 1, . . . , r {
   d := d2 (mod n).
   δ := the t least significant bits of d.
   ci := mi ⊕ δ.                                            /* Here ⊕ denotes bit-wise XOR of t-bit strings */
}
d := d2 (mod n).

Blum–Goldwasser encryption involves computation of r modular squares in and is quite fast (for example, faster than RSA encryption with a general encryption exponent). It makes sense to assume that the initial choice of d is from , since finding a non-zero non-invertible element of is as difficult as factoring n.

For an intruder to determine the plaintext message m from the corresponding ciphertext message, the values of d inside the for loop are necessary. These can be obtained by taking repeated square roots modulo n. Since n is composite, this is a difficult problem. On the other hand, since the recipient knows the prime divisors p and q of n, taking square roots modulo n requires only polynomial-time effort.

Blum–Goldwasser decryption

Recall from Exercise 3.43 that a quadratic residue (where n is the public key of the recipient) has four distinct square roots of which exactly one is again a quadratic residue modulo n. This distinguished square root y of d satisfies the congruences yd(p+1)/4 (mod p) and yd(q+1)/4 (mod q). In the decryption Algorithm 5.13, we assume that .

Algorithm 5.13 assumes that each value of d is a quadratic residue modulo n. This can be verified by inserting in the for loop a check whether , before an attempt is made to compute the square root of d modulo n. If (c1, . . . , cr, d) is a valid ciphertext message, this condition necessarily holds, and there is no fun wasting time for checking obvious things. However, if there is a possibility that d is altered by an (active) adversary (or corrupted during transmission), one may insert this check. In that case, the routine should report failure, when the square root of a quadratic non-residue modulo n is to be computed.

Algorithm 5.13. Blum–Goldwasser decryption

Input: The Blum–Goldwasser private key (p, q) of the recipient and the ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and .

Output: The recovered plaintext message m = m1 . . . mr, where each mi is a bit string of length t.

Steps:

for i = rr – 1, . . . , 1 {
   a := d(p+1)/4 (mod pand b := d(q+1)/4 (mod q).
   Compute  with d ≡ a (mod pand d ≡ b (mod q).  /* Use CRT */
   δ := the t least significant bits of d.
   mi := ci ⊕ δ.  /* XOR of t-bit strings */
}

5.2.5. The ElGamal Public-key Encryption Algorithm

The ElGamal encryption algorithm works in a group G in which it is difficult to solve the Diffie–Hellman problem (DHP). Typical candidates for G include the multiplicative group of a finite field (usually q is a prime or a power of 2), the (additive) group of points on an elliptic curve over a finite field and the (additive) group (called the Jacobian) of reduced divisors on an hyperelliptic curve over a finite field. Here we assume that G is multiplicatively written and has order n. It is not necessary for G to be cyclic, but we should have at our disposal an element with a suitably large (preferably prime) order k. We essentially work in the cyclic subgroup H of G generated by g (but using the arithmetic of G). For the ElGamal scheme, G (together with its representation), g, n and k are made public and can be shared by different entities on a network.

ElGamal key pair

Generating a key pair for the ElGamal scheme (Algorithm 5.14) involves an exponentiation in G. In order to make the exponentiation efficient, the exponent (the private key) is often chosen to have a small number of 1 bits. However, if this number is too small, exhaustive search by an adversary may become feasible.

If the DLP can be solved in G, the private key d can be computed from the public key gd. This amounts to breaking a system based on this key pair. This is why we often say that the security of the ElGamal encryption scheme banks on the intractability of the DLP. But as we see shortly, the DHP is the more fundamental computational problem that dictates the security of ElGamal encryption.

Algorithm 5.14. ElGamal key generation

Input: G, g and k as defined above.

Output: A random ElGamal key pair.

Steps:

Generate a random integer d, 2 ≤ dk – 1.

Return gd as the public key and d as the private key.

ElGamal encryption

Given a message , the ElGamal encryption procedure (Algorithm 5.15) generates a pair (r, s) of elements of G as the ciphertext message and thus corresponds to message expansion by a factor of 2. Clearly, the sender has all the relevant information for computing (r, s). The need for using a different session key for each encryption is explained in Exercise 5.6.

Algorithm 5.15. ElGamal encryption

Input: (G, g, k and) the ElGamal public key gd of the recipient and the plaintext message .

Output: The ciphertext message (where H = 〈g〉).

Steps:

Generate a (random) session key d′, 2 ≤ d′ ≤ k – 1.

r := gd′.

s := mgdd′ = m(gd)d′.

Notice that ElGamal encryption uses two exponentiations in G to exponents which are O(k). Therefore, the running time of Algorithm 5.15 reduces, if smaller values of k are selected. On the other hand, if k is too small, the square-root methods in H = 〈g〉 may become efficient (see Section 4.4.1). In practice, it is recommended that k be taken as a prime of length 160 bits or more.

ElGamal decryption

ElGamal decryption involves an exponentiation in G to an exponent which is O(k). It is easy to verify that Algorithm 5.16 performs decryption correctly and that the recipient has the necessary information to carry out decryption.

Algorithm 5.16. ElGamal decryption

Input: (G, g, k and) the ElGamal private key d of the recipient and the ciphertext message (where H = 〈g〉).

Output: The recovered plaintext message .

Steps:

m := srd = srkd.

An eavesdropper Carol knows the domain parameters G, g, k and n and also the recipient’s public key gd. Determining the message m from a knowledge of the corresponding ciphertext (r, s) is then equivalent to computing the element gdd. This implies that a (quick) solution of the DHP permits Carol to decrypt a ciphertext. If a (quick) solution of the DLP is available, then the element gdd is computable fast. The reverse implication is, however, not clear: it may be easier to solve the DHP than the DLP, though no concrete evidences are available to corroborate this fact.

5.2.6. The Chor–Rivest Public-key Encryption Algorithm

The Chor–Rivest encryption algorithm is based on a variant of the subset sum problem. It selects a prime p and an integer h ≥ 2, uses a knapsack set A = {a0, . . . , ap–1} with 1aiph – 2 for each i, and considers sums of the form , , with . In order to construct the set A for which the h-fold sum s is uniquely determined by the binary vector (∊0, . . . , ∊p–1) of weight h (that is, with exactly h bits equal to 1), we take the help of the finite field . We represent as , where is irreducible of degree h and where x is the residue class of X in . The parameters p and h must be so chosen that ph –1 is reasonably smooth, so that the integer factorization of ph – 1 can be easily computed. This helps us in two ways. First, a generator g(x) of the multiplicative group can be made available quickly using Algorithm 3.25. Second, the Pohlig–Hellman method of Section 4.4.1 becomes efficient for computing discrete logarithms in . We can then take ai := indg(x)(x + i), i = 0, 1, . . . , p – 1. If (∊0, . . . , ∊p–1) and are two binary vectors of weight h, then implies , that is, , that is, for all i = 0, . . . , p – 1 , since otherwise x would satisfy a non-zero polynomial of degree < h.

Chor–Rivest key pair

A randomly permuted version of a0, . . . , ap–1 shifted by a noise (that is, a random bias) d together with p and h constitute the public key of the Chor–Rivest scheme. The private key, on the other hand, comprises the polynomials f(X) and g(x), the permutation just mentioned and the noise d. Algorithm 5.17 elaborates the generation of such a key pair. The same values of p and h can be used by different entities on a network. So we assume that p and h are provided instead of generated by the recipient as a part of his public key. For brevity, we use the notation q := ph.

Key generation may be a long process in the Chor–Rivest scheme depending on how difficult it is to compute all the indexes indg(x)(x + i). Furthermore, the size of the public key is quite large, namely O(ph log p). Typically one may take p ≈ 200 and h ≈ 25. The original paper of Chor and Rivest [54] recommends the possibilities (197, 24), (211, 24), (243, 24) and (256, 25) for (p, h). Note that 256 is not a prime, but Chor–Rivest’s algorithm works, even when p is a power of a prime. For the sake of simplicity, we here stick to the case that p is a prime.

Algorithm 5.17. Chor–Rivest key generation

Input: A prime p and an integer h ≥ 2 such that ph – 1 is smooth.

Output: A Chor–Rivest key pair.

Steps:

Choose an irreducible polynomial of degree h.

Use the representation , where x := X + 〈f(X)〉.

Choose a random generator g(x) of .

Compute the indexes ai := indg(x)(x + i) for i = 0, 1, . . . , p – 1.

Select a random permutation π of {0, 1, . . . , p – 1}.

Select a random noise d in the range 0 ≤ dq – 2.

Compute αi := aπ(i) + d (mod q – 1) for i = 0, 1, . . . , p – 1.

Return0, α1 . . . , αp–1) as the public key and (f, g, π, d) as the private key.

Chor–Rivest encryption

The Chor–Rivest encryption procedure (Algorithm 5.18) assumes that the input plaintext message is represented as a binary vector (m0, . . . , mp–1) of weight (that is, number of one-bits) equal to h. Since there are such binary vectors, arbitrary binary strings of bit length can be encoded into binary vectors of the above special form. See Chor and Rivest [54] for an algorithm that describes how such an encoding can be done. Chor–Rivest encryption is quite fast, since it computes only h integer additions modulo q – 1.

Algorithm 5.18. Chor–Rivest encryption

Input: The Chor–Rivest public key (α0, . . . , αp–1) (together with p and h) and the plaintext message (m0, . . . , mp–1) which is a binary vector of weight h.

Output: The ciphertext message .

Steps:

(mod q – 1).

Chor–Rivest decryption

The Chor–Rivest decryption procedure (Algorithm 5.19) generates a monic polynomial of degree h, the h (distinct) roots of which gives the non-zero bits mi in the original plaintext message.

In order to prove that the decryption correctly works, note that (mod q – 1) , so that (mod f(X)). The polynomial u(X) is computed as one of degree < h. Adding f(X) to u(X) gives a monic polynomial v(X) of degree h, which is congruent modulo f(X) to . The roots of v(X) can be obtained either by a root finding algorithm or by trial divisions of v(X) by X + i, i = 0, 1, . . . , p – 1. Applying the inverse of π on these roots then reconstructs the plaintext message.

Algorithm 5.19. Chor–Rivest decryption

Input: The Chor–Rivest private key (f, g, π, d) (together with p and h) and the ciphertext message .

Output: The recovered plaintext message (m0, . . . , mp–1) which is a binary vector of weight h.

Steps:

s := chd (mod q – 1).

u(X) := g(X)s (mod f(X)).

v(X) := f(X) + u(X).

Factorize u(X) as u(X) = (X + i1)· · ·(X + ih), .

For i = 0, 1, . . . , p – 1 set

An eavesdropper sees only the sum (mod q – 1) of the (known) knapsack weights α0, . . . , αp–1. In order to recover m0, . . . , mp–1, she should solve the SSP. By choosing p and h carefully, the density of the knapsack set can be adjusted to be high, that is, larger than what the cryptanalytic routines described in Section 4.8 can handle. Thus, the Chor–Rivest scheme is assumed to be secure. However, as discussed in Chor and Rivest [54], the security of the system breaks down, when certain partial information on the private key are available.

*5.2.7. The XTR Public-key Encryption Algorithm

XTR, a phonetic abbreviation of efficient and compact subgroup trace representation, is designed by Arjen Lenstra and Eric Verheul as an attractive alternative to RSA (and similar cryptosystems including the ElGamal scheme over finite fields) and elliptic curve cryptosystems (ECC). The attractiveness of XTR arises from the following facts:

XTR, though not a fundamental breakthrough, deserves treatment in this chapter. The working of XTR is somewhat involved and we plan to present only a conceptual description of the algorithm, hiding the mathematical details.

XTR considers the following tower of field extensions:

where p ≡ 2 (mod 3) is a prime, sufficiently large so that computing discrete logs in using known algorithms is infeasible. We have p6 – 1 = (p – 1)(p + 1)(p2p + 1)(p2 + p + 1). Let q be a prime divisor of p2p + 1 of bit length 160 or more. There is a unique subgroup G of with #G = q. G is called the XTR (sub)group, whereas the entire group is called the XTR supergroup. The XTR group G is cyclic (Lemma 2.1, p 27). Let g be a generator of G, that is, G = 〈g〉 = {1, g, g2, . . . , gq–1}.

The working of XTR is based on the discrete log problem in G. Since p2p + 1 and hence q are relatively prime to the orders of the multiplicative groups of all proper subfields of , computing discrete logs in G is (seemingly) as difficult as that in , that is, one gets the same level of security by the use of G instead of the full XTR supergroup.

The main technical innovation of XTR is the proposal of a compact representation of the elements of G in place of the obvious representation using ⌈6 lg p⌉ bits inherited from that of . This is precisely where the intermediate field comes into picture. We require a map , so that we can represent elements of G by those of . This map offers two benefits. First, the elements of G can now be represented using ⌈2 lg p⌉ bits leading to a three-fold reduction in the key size. Second, the arithmetic of can be exploited to implement the arithmetic in G, thereby improving the efficiency of encryption and decryption routines (compared to those over the full XTR supergroup).

The map uses the traces of elements of over (Definition 2.59). In this section, we use the shorthand notation Tr to stand for . The conjugates of an element over are h, hp2, hp4 and so

Let us now specialize to . Since p2p – 1 (mod p2p + 1) and p4 ≡ –p (mod p2p + 1), the conjugates of h are gn, g(p–1)n, gpn. Thus, Tr(gn) = gn + g(p–1)n + gpn. Moreover,

so the minimal polynomial of h = gn over is

This minimal polynomial is determined uniquely by Tr(gn) and so we can represent by . Note, however, that this representation is not unique, that is, the map , is not injective. More precisely, the only elements of G that map to Tr(gn) are the conjugates gn, g(p–1)n, gpn of gn. This is often not a serious problem, as we see below.

In order to complete the description of the implementation of the arithmetic of the group G, we need to address two further issues. This is necessary, since the trace representation defined above is not a homomorphism of groups. First, we specify how one can implement the arithmetic of . Since p ≡ 2 (mod 3), X2+X+1 is irreducible over . If is a root of X2 + X + 1, we have the standard representation . That is, we can represent . Since 1 + α + α2 = 0, we have y0 + y1α = (–α – α2)y0 + y1α = (y1y0)α + (–y02. This leads to the non-standard representation

Since p ≡ 2 (mod 3) and α3 = 1 + (α – 1)(α2 + α + 1) = 1, the -basis {α, α2} of is the same as the normal basis {α, αp}. Under this basis, the basic arithmetic operations in can be implemented using only a few multiplications (and some additions/subtractions) in , as described in Table 5.1. Here, the operands are x = x1α + x2α2, y = y1α + y2α2 and z = z1α + z2α2.

Table 5.1. Basic operations in
OperationNumber of multiplications
xp0 (since xp = x2α + x1α2.)
x22 (since x2 = x2(x2 – 2x1)α + x1(x1 – 2x22.)
xy3 (since xy = (x2y2x1y2x2y1)α + (x1y1x1y2x2y12, that is, it suffices to compute x1y1, x2y2, (x1 + x2)(y1 + y2).)
xzyzp4 (since xzyzp = (z1(y1x2y2) + z2(x2x1 + y2))α + (z1(x1x2 + y1) + z2(y2x1y1))α2.)

Now, we explain how arithmetic operations in G translate to those in under the representation of by . To start with, we show how the knowledge of Tr(h) and n allows one to compute Tr(hn) for . This corresponds to an exponentiation in G. For , define the polynomial

where h1, h2, are the three roots (not necessarily distinct) of Fc(X). For , we use the notation

Putting c = Tr(g) yields cn = Tr(gn), or, more generally, for c = Tr(gk) we have cn = Tr(gkn). Algorithm 5.20 computes

given (for example, Tr(gk)) and (typically ). The correctness of the algorithm is based on the following identities, the derivations of which are left to the reader (alternatively, see Lenstra and Verheul [170]).

Equation 5.1


Equation 5.2


Equation 5.3


Equation 5.4


Equation 5.5


Equation 5.6


Equation 5.7


Equation 5.8


Algorithm 5.20. XTR exponentiation

Input: and .

Output:.

Steps:

if (n < 0) {
   Compute Sn(c).
   Use Equation (5.3) to compute and return Sn(c).
}
if (n = 0) { Return (cp, 3, c). }
if (n = 1) { Return (3, cc2 – 2cp). }
if (n = 2) {
   Compute S1(cand hence c3 using Equation (5.5).
   Return (c1c2c3).
}
/* Now n ≥ 3 */

/* Initialize */
k := 1.
Compute S2k+1(c) = S3(c) = (c2c3c4from S2(cusing Equation (5.5).
/* Exponentiation loop */
for j = l – 1, l – 2, . . . , 0 {
   if (mj = 0) {
      Compute S4k+1(c) = (c4kc4k+1c4k+2from S2k+1(c) = (c2kc2k+1c2k+2).
      /* Use Equation (5.6) for c4k and c4k+2 and Equation (5.7) for c4k+1 */
   } else {       /* mj = 1 */
      Compute S4k+3(c) = (c4k+2c4k+3c4k+4from S2k+1(c) = (c2kc2k+1c2k+2).
      /* Use Equation (5.6) for c4k+2 and c4k+4 and Equation (5.8) for c4k+3 */
   }
   k := 2k + mj.
}

/* Now k = m and we have computed 

if (n is even) {
   Compute Sn(c) = (cn–1cncn+1from Sn–1(c) = (cn–2cn–1cn).
   /* Use Equation (5.5) to compute cn+1 from Sn–1 */
}

A careful analysis suggests that the computation of cn from c requires 8 lg n multiplications in . An exponentiation in , on the other hand, requires an expected number of 23.4 lg n multiplications in (assuming that, in , the time for squaring is 80 per cent of that of multiplication). Thus, the XTR representation provides a speed-up of about 3.

XTR key pair

The domain parameters for an XTR cryptosystem include primes p and q satisfying the following requirements:

We require a generator g of the XTR group G. Since we planned to replace working in G by working in , the element g is not needed explicitly. The trace Tr(g) suffices for our purpose. Lenstra and Verheul [170, 172] describe several methods for obtaining the domain parameters p, q, Tr(g). We describe here the naivest strategies. Algorithm 5.21 outputs the primes p, q with |p| = lp and |q| = lq for some given lp, .

Algorithm 5.21. Generation of XTR primes

Randomly choose such that q := r2r + 1 is a prime of size |q| = lq.

Randomly choose such that p := r + kq is a prime with |p| = lp and p ≡ 2 (mod 3).

Determination of Tr(g) for a suitable g requires some mathematics. First, notice that if the polynomial is irreducible (over ) for some , then c = Tr(h) for some with ord h|(p2p + 1). Moreover, c(p2p+1)/q, if not equal to 3, is the trace of an element (for example, h(p2p+1)/q) of order q. Thus, we may take Tr(g) = c(p2p+1)/q. Although we do not need it explicitly, the corresponding can be taken to be any root of the polynomial FTr(g)(X).

What remains to explain is how one can find an irreducible . A randomized algorithm results from the fact that for a randomly chosen the polynomial Fc(X) is irreducible with probability ≈ 1/3.

Once the domain parameters of an XTR system are set, the recipient chooses a random and computes Tr(gd) using Algorithm 5.20. The tuple (p, q, Tr(g), Tr(gd)) is the public key and d the private key of the recipient.

XTR encryption

XTR encryption (Algorithm 5.22) is very similar to ElGamal encryption. The only difference is that now we work in under the trace representation of the elements of G, that is, one uses Algorithm 5.20 for computing exponentiations in G.

Algorithm 5.22. XTR encryption

Input: The public key (p, q, Tr(g), Tr(gd)) of the recipient and the message to be encrypted.

Output: The ciphertext message .

Steps:

Generate a random session key .

Compute r := Tr(gd) using Algorithm 5.20 with c := Tr(g) and n := d′.

Compute Tr(gdd) using Algorithm 5.20 with c := Tr(gd) and n := d′.

Set s := m Tr(gdd).

XTR decryption

XTR decryption (Algorithm 5.23) is again analogous to ElGamal decryption except that we have to incorporate the XTR representation of elements of G.

Algorithm 5.23. XTR decryption

Input: The private key d of the recipient and the ciphertext .

Output: The recovered plaintext message m.

Steps:

Compute Tr(gdd) using Algorithm 5.20 with c := r = Tr(gd) and n := d.

Set .

Note that XTR encryption and decryption use Algorithm 5.20 for performing exponentiations. Therefore, these routines run about three times faster than the corresponding ElGamal routines based on the standard arithmetic.

*5.2.8. The NTRU Public-key Encryption Algorithm

Hoffstein et al. [130] have proposed the NTRU encryption scheme in which encryption involves a mixing system using the polynomial algebra and reductions modulo two relatively prime integers α and β. The decryption involves an unmixing system and can be proved to be correct with high probability. The security of this scheme banks on the interaction of the mixing system with the independence of the reductions modulo α and β. Attacks against NTRU based on the determination of short vectors in certain lattices are known. However, suitable choices of the parameters make NTRU resistant to these attacks. The most attractive feature of the NTRU scheme is that encryption and decryption in this case are much faster than those in other known schemes (like RSA, ECC and even XTR).

NTRU key pair

NTRU parameters include three positive integers n, α and β with gcd(α, β) = 1 and with β considerably larger than α (see Table 5.2). Consider the polynomial algebra . An element of is represented as a polynomial f = f0 + f1X + · · · + fn–1Xn–1 or, equivalently, as a vector (f0, f1, . . . , fn–1) of the coefficients. Note that Xn – 1 is not irreducible in (for n ≥ 2) and so R is not a field, but that does not matter for the NTRU scheme. For two polynomials f, g of degree < n and with integer coefficients, we denote by f g the product of f and g in , whereas f and g as elements of R multiplies to fg = h with

Table 5.2. Recommended NTRU parameters
Securitynαβνfνgνu
short-term10736415125
moderate1673128612018
standard[*]2633128502416
high50332562167255

[*] Assumed to be equivalent to 1024-bit RSA

NTRU works with polynomials having small coefficients. More specifically, we define the following subsets of R. The message space (that is, the set of plaintext messages) consists of all polynomials of R with coefficients reduced modulo α. Unlike our representation of so far, we use the integers between –α/2 and +α/2 to represent the coefficients of polynomials in , that is,

For ν1, , we also define the subset

of R. For suitably chosen parameters νf, νg and νu (see Table 5.2), we use the special notations:

With these notations we are now ready to describe the NTRU key generation routine. The subsets , , and are assumed to be public knowledge (along with the parameters n, α and β).

Algorithm 5.24. NTRU key generation

Input: n, α, β and , as defined above.

Output: A random NTRU key pair.

Steps:

Choose and randomly.

/* f must be invertible modulo both α and β */

Compute fα and fβ satisfying fαf ≡ 1 (mod α) and fβf ≡ 1 (mod β).

h := fβg (mod β).

Return h as the public key and f (along with fα) as the private key.

The polynomial fα can be computed from f during decryption. However, for the sake of efficiency, it is recommended that fα be stored along with f.

The integers α and β are either small primes or small powers of small primes (Table 5.2). The most time-consuming step in the NTRU key generation procedure is the computation of the inverses fα and fβ. Suppose we want to compute the inverse of f in , where p is a small prime and e is a small exponent (we may have e = 1). We first compute f(X)–1 in the ring . Since p is a prime, is a field, that is, is a Euclidean domain (Exercise 2.31). We compute the extended Euclidean gcd of f(X) with Xn – 1. If f(X) and Xn – 1 are not coprime modulo p, then f(X) is not invertible in , else we get and s(X) is the inverse of f(X) in . A randomly chosen f(X) with gcd(f(1), p) = 1 has high probability of being invertible modulo p. Recall that we have chosen , so that f(1) = 1.

If e = 1, we have already computed the desired inverse of f(X). If e > 1, we have to lift the inverse fp(X) = u(X) of f(X) modulo p to the inverse fp2 (X) of f(X) modulo p2, and then to the inverse fp3 (X) of f(X) modulo p3, and so on. Eventually, we get the inverse fpe (X) of f(X) modulo pe. Here we describe the generic lift procedure of fpk (X) to fpk+1 (X). In the ring , we have fpkf ≡ 1 (mod pk). We can write fpk+1 (X) = fpk (X) + pka(X) for some . Substituting this value in fpk+1f ≡ 1 (mod pk+1) gives the unknown polynomial a(X) as

where s(X) = fp(X) is the inverse of f modulo p.

It is often recommended that f(X) be taken of the form for some . In this case, fα(X) = 1 is trivially available and need not be computed as mentioned above. Such a choice of f also speeds up NTRU decryption (see Algorithm 5.26) by reducing the number of polynomial multiplications from two to one. The inverse fβ, however, has to be computed (but need not be stored).

NTRU encryption

For NTRU encryption (Algorithm 5.25), the message is encoded to a polynomial in . The costliest step in this algorithm is computing the product uh and can be done in time O(n2). Asymptotically better running time (O(n log n)) is achievable by Algorithm 5.25, if one uses faster polynomial multiplication routines (like those based on fast Fourier transforms). However, for the cryptographic range of values of n, straightforward quadratic multiplication gives better performance. Most other encryption schemes (like RSA) take time O(n3), where n is the size of the modulus. This explains why NTRU encryption is much faster than conventional encryption routines.

Algorithm 5.25. NTRU encryption

Input: (n, α, β and) the NTRU public key h of the recipient and the plaintext message .

Output: The ciphertext c which is a polynomial in R, reduced modulo β.

Steps:

Randomly select .

c := αuh + m (mod β).

NTRU decryption

NTRU decryption (Algorithm 5.26) involves two multiplications in R and runs in time O(n2). In order to prove the correctness of Algorithm 5.26, one needs to verify that v ≡ αug + fm (mod β). With an appropriate choice of the parameters, it can be ensured that almost always the polynomial has coefficients in the interval –β/2 and +β/2. In that case, we have the equality v = αug + fm in R. Multiplication of v by fα and reduction modulo α now clearly retrieves m.

Algorithm 5.26. NTRU decryption

Input: The NTRU private key f (and fα) of the recipient and the ciphertext message c.

Output: The recovered plaintext message .

Steps:

v := fc (mod β).

/* The coefficients of v are chosen to lie between –β/2 and +β/2 */

m := fαv (mod α).

If f is chosen to be of the special form f = 1 + αf1 (for some polynomial f1), then v = αug + αf1m + m. Thus, reduction of v modulo α straightaway gives m, that is, there is no need to multiply v by fα. Also fα (having the trivial value 1) need not be stored in the private key. To sum up, taking f to be of the above special form increases the efficiency of the NTRU scheme without (seemingly) affecting its security. But now f is no longer an element of and some care should be taken to choose suitable values of f.

NTRU decryption fails, usually when m is not properly centred (around 0). In that case, representing v as a polynomial with coefficients in the range –β/2 + x and +β/2 + x for a small positive or negative value of x may result in correct decryption. If, on the other hand, no values of x work, NTRU decryption cannot recover m easily and is said to suffer from a gap failure. For suitable parameter values, gap failures are very unlikely and can be ignored for all practical purposes.

Now, let us see how the NTRU system can be broken. In order to find out the private key f from the public key h = fβg, one may keep on searching for exhaustively, until . Alternatively, one may try all , until . In a similar manner, m can be retrieved from c by trying all , until . Clearly, such an attack takes expected time proportional to the size of or or .

A baby-step–giant-step strategy reduces the running times to the square roots of the sizes of the above sets. For example, suppose we want to compute f from h. We split f = f1 + f2 into two nearly equal pieces f1 and f2. If n is odd, f1 may contain the (n + 1)/2 most significant terms and f2 the (n – 1)/2 least significant terms of f. Now, we compute (f2, –f2h (mod β)) for all possibilities of f2 and store the pairs sorted by the second component. Next, for each possibility of f1 (baby step) we compute f1h (mod β) and see if there is any f2 (giant step) for which f1h (mod β) and –f2h (mod β) have nearly equal values. If a matching pair (f1, f2) is located, we take f = f1 + f2. A similar method works for guessing m from c.

It is necessary to take the sets , and big enough, so that exhaustive or square root attacks are not feasible. Typically, choosing the sizes of these sets to be ≥ 2160 is deemed sufficiently secure.

Another relevant attack is discussed in Exercise 5.11. By far, the most sophisticated attack on the NTRU encryption scheme is based on finding short vectors in a lattice. We describe this attack in connection with the computation of the private key f from a knowledge of the private key h. Let L denote the lattice in generated by the rows of the 2n × 2n matrix:

where h = h0 + h1X + · · · + hn–1Xn–1 = (h0, h1, . . . , hn–1) and where λ is a parameter whose choice is discussed below. Since h = gf–1 (mod β), multiplying the i-th row by fi–1 (i = 1, . . . , n) and adding we conclude that the vector v := (λf0, λf1, . . . , λfn–1, g0, g1, . . . , gn–1) is in L. By tuning the value λ, the attacker maximizes the chance for v to be a short vector in L. However, if the system parameters are appropriately selected, lattice reduction algorithms become rather ineffective in finding v. Heuristic evidences suggest that this attack runs in time exponential in n.

Exercise Set 5.2

5.1Establish the correctness of Algorithm 5.4.
5.2
  1. Assume that the same message m is encrypted using the RSA algorithm and using the public keys (n1, e), . . . , (ne, e) of e entities each of which has the same encryption exponent e. Assume further that the moduli n1, . . . , ne are pairwise coprime. Specify a method by which an adversary can reconstruct the message m from a knowledge of the ciphertext messages c1, . . . , ce. [H]

  2. How can such an attack be prevented? [H]

5.3
  1. Let n, . How many solutions does the polynomial XeX have in ? [H]

  2. In particular, conclude that if n = pq is an RSA modulus and e is the encryption exponent, there exist gcd(e – 1, p – 1) × gcd(e – 1, q – 1) messages m for which mem (mod n). Such messages are often called unconcealed. The number of unconcealed messages for random parameters n and e is, in general, vanishingly low compared to n.

5.4Assume that two parties Bob and Barbara share a common RSA modulus n but relatively prime encryption exponents e1 and e2. Alice encrypts the same message by (n, e1) and (n, e2) and sends the ciphertext messages to Bob and Barbara respectively. Suppose also that Carol intercepts both the ciphertexts. Describe a method by which Carol retrieves the (common) plaintext. [H]
5.5Let n = pq be a Rabin public key and let be a quadratic residue modulo n. Show that the knowledge of the four square roots of c modulo n breaks the Rabin system.
5.6What is the disadvantage of using the same session key in the ElGamal encryption scheme for encrypting two different messages (for the same recipient)? [H]
5.7Let p be an odd prime and g a generator of .
  1. Show that the set S := {g2i | i = 0, 1, . . . , (p – 3)/2} is precisely the set of all quadratic residues modulo p. Show also that S is a subgroup of .

  2. Assume that ygx (mod p) for some . Show that the least significant bit of x is 0 or 1 according as whether y(p–1)/2 is congruent to 1 or –1 modulo p respectively. Thus, it is easy to determine from y the least significant bit of the discrete logarithm x = indg y.

  3. Assume that p ≡ 3 (mod 4) and that p, g, y are only known (but x is not known). Suppose further that there is an oracle (a black box) that, given , returns the second least significant bit of indg z. Show that x = indg y can be easily computed by making a polynomial (in log p) number of calls to this oracle. [H]

5.8Show that if the private-key parameters f(X) and d are known to a cryptanalyst of the Chor–Rivest scheme, she can recover the other parts of the private key and thus break the system completely. [H]
5.9Show that if f(X) is only known to a cryptanalyst of the Chor–Rivest scheme, then also she can recover the full private key. [H]
5.10
  1. Derive the identities of Equations (5.1) through (5.8) (p 325).

  2. With the notations of Section 5.2.7 deduce that:

    c3=c3 – 3cp+1 + 3.
    c4=c4 – 4cp+2 + 2c2p + 4c.

5.11In this exercise, we use the notations of Section 5.2.8. Assume that Alice encrypts the same message m several times using the NTRU public key h of Bob, but with different random polynomials , i = 1, . . . , r, and sends the corresponding ciphertext messages c1, . . . , cr. Describe a strategy how an eavesdropper Carol can recover a considerable part of u1. [H] Trying all the possibilities for the (relatively small) unknown part of u1 allows Carol to retrieve m with little effort.

5.3. Key Exchange

Consider the scenario wherein two parties Alice and Bob want to share a secret information (say, a DES key for future correspondence), but it is not possible to communicate this secret by personal contact or by conversing over a secure channel. In other words, Alice and Bob want to arrive at a common secret value by communicating over a public (and hence insecure) channel. A key-exchange or a key-agreement protocol allows Alice and Bob to do so. The protocol should be such that an eavesdropper listening to the conversation between Alice and Bob cannot compute the secret value in feasible time.

Public-key technology is used to design a key-exchange protocol in the following way. Alice generates a key pair (eA, dA) and sends the public key eA to Bob. Similarly, Bob generates a random key pair (eB, dB) and sends the public key eB to Alice. Now, Alice and Bob respectively compute the values sA = f(eB, dA) and sB = f(eA, dB) using their respective knowledges, where f is a suitably chosen function. If sA = sB, then this value can be used as the shared secret between Alice and Bob. The intruder Carol can intercept eA and eB, but f should be such that a knowledge of eA and eB alone does not allow Carol to compute sA = sB. She needs dA or dB for this computation. Since (eA, dA) and (eB, dB) are key pairs, we assume that it is infeasible to compute dA from eA or dB from eB.

In what follows, we describe some key-exchange protocols. The security of these protocols is dependent on the intractability of the DHP (or the DLP). We provide a generic description, where we work in a finite Abelian multiplicative group G of order n. We write the identity of G as 1. G need not be cyclic, but we assume that an element having suitably large (and preferably prime) multiplicative order m is provided. G, g, n and m may be made publicly available, but G should be a group in which one cannot compute discrete logarithms in feasible time. Typical examples of G are given in Section 5.2.5.

5.3.1. Basic Key-Exchange Protocols

Basic key-exchange protocols provide provable security against passive attacks under the intractability of the DHP. However, several models of active attacks are known for the basic protocols. One requires authentication (validation of the public keys) to eliminate these attacks.

The Diffie–Hellman key-exchange protocol

The Diffie–Hellman (DH) key-exchange algorithm [78] is one of the pioneering discoveries leading to the birth of public-key cryptography.

Algorithm 5.27. Diffie–Hellman key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes eA := gdA.

Alice sends eA to Bob.

Bob generates a random and computes eB := gdB.

Bob sends eB to Alice.

Alice computes s := (eB)dA = gdAdB.

Bob computes s := (eA)dB = gdAdB.

if (s = 1) { Return “failure”. }

The DH scheme fails, if the shared secret turns out to be a trivial element (like the identity) of G. In that case, Alice and Bob should re-execute the protocol with different key pairs. The probability of such an incident is, however, extremely low.

The intruder Carol learns the group elements gdA and gdB by listening to the conversation between Alice and Bob and intends to compute s = gdAdB. Thus, she has to solve an instance of the DHP in the group G. By assumption, this is computationally infeasible. This is how the DH scheme derives its security.

Small-subgroup attacks

A small-subgroup attack on the DH protocol can be mounted by an active adversary. Assume that the order m of g in G is composite and has known factorization m = uv with u small. Carol intercepts the messages between Alice and Bob, replaces them by their respective v-th powers and retransmits the modified messages.

Algorithm 5.28. A small-subgroup attack by an active eavesdropper

Alice generates a random and computes eA := gdA.

Alice transmits eA for Bob.

Carol intercepts eA, computes and sends to Bob.

Bob generates a random and computes eB := gdB.

Bob transmits eB for Alice.

Carol intercepts eB, computes and sends to Alice.

Alice computes .

Bob computes .

if (s′ = 1) { Return “failure”. }

But ord g = uv and so (s′)u = 1, that is, s′ has only u – 1 non-trivial values. Since u is small, the possibilities for s′ can be exhaustively searched by Carol. The best countermeasure against this attack is to take m to be a prime (of bit length ≥ 160).

Even when m is prime, it may be the case that the cofactor k := n/m has a small divisor u and it is possible that an active attacker intervenes in such a way that Alice and Bob agree upon a secret value of order (equal to or dividing) u. For example, Carol may replace both the transmitted public keys by an element h of order u. If dA and dB are congruent modulo u, the shared secret has only a few possible values and Carol can obtain the correct value by exhaustive search. On the other hand, if dAdB (mod u), Alice and Bob do not come up with the same secret. However, if Alice uses her secret to encrypt a message for Bob, it remains easy for Carol to decrypt the intercepted ciphertext by trying only a few choices for Alice’s key. Alice and Bob can prevent this attack by refusing to accept as the shared secret not only the trivial value s = 1 but also elements of small orders.

A small-subgroup attack can also be mounted by one of the communicating parties (say, Bob) in an attempt to gain information about the other’s (Alice’s) secret dA. Let us continue to assume that the cofactor k := n/m has a small divisor u. Bob finds an element h in G of order u. Instead of eB = gdB Bob now sends h to Alice. Alice computes the shared secret as . Bob, on the other hand, can normally compute sB := gdAdB. Now, suppose that Alice uses a symmetric cipher with the key (or some part of it) and sends the ciphertext to Bob. In order to decrypt, Bob tries all of the u possible keys sBhj for j = 0, 1, . . . , u – 1. The value of j for which decryption succeeds equals dA modulo u. A similar attack can be mounted by Bob, when is chosen to be an element (like h itself) of order u.

If G is cyclic and H is the subgroup generated by g, then an element is in H if and only if am = 1 (Proposition 2.5, p 27). Moreover, if gcd(k, m) = 1, each communicating party can check the validity of the other party’s public key by using an m-th power exponentiation. An element like h or h of the last paragraph does not pass this test. If so, Alice should abandon the protocol. However, the validation of the public key requires a modular exponentiation and thereby slows down the protocol.

Cofactor exponentiation

We now present an efficient modification of the basic Diffie–Hellman scheme that prevents small-subgroup attacks (by a communicating party or an eavesdropper) without calculating an extra exponentiation. We continue with the notation k := n/m and assume that k is coprime to m. Now, the shared secret is computed as gdAdB or gkdAdB depending on whether compatibility with the original DH scheme is desired or not. Algorithm 5.29 describes the modified DH algorithm. Solve Exercise 5.12 in order to establish the effectiveness of this algorithm against small-subgroup attacks.

5.3.2. Authenticated Key-Exchange Protocols

Other active attack models on the (basic or modified) DH protocol can be conceived of. One important class of attacks is now described.

Unknown key-share attacks

An unknown key-share attack on a key-exchange protocol makes a party believe that (s)he shares a secret with another party, whereas the secret is actually shared by a third party. Assume that Carol can monitor and modify every message between Alice and Bob. When Alice and Bob execute Algorithm 5.27 or 5.29, Carol can intervene and pretend to Alice that she is Bob and to Bob that she is Alice. At the end of the protocol, Alice and Carol come up with a shared secret sAC, and Bob and Carol with another shared secret sBC. Alice believes that she shares sAC with Bob, and Bob believes that he shares sBC with Alice.

Algorithm 5.29. Diffie–Hellman key exchange with cofactor exponentiation

Input: G, g, n, m and k as defined above and a flag indicating compatibility with the original DH scheme.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes eA := gdA.

Alice sends eA to Bob.

Bob generates a random and computes eB := gdB.

Bob sends eB to Alice.

if (compatibility with the original DH algorithm is desired) {
   Alice assigns δA := k–1dA (mod m).
   Bob assigns δB := k–1dB (mod m).
else {
   Alice assigns δA := dA (mod m).
   Bob assigns δB := dB (mod m).
}
Alice computes s := (eB)kδA.
Bob computes s := (eA)kδB.
if (s = 1) { Return “failure”. }

Now, when Alice wants to send a secret message m to Bob, she encrypts m by sAC and transmits the ciphertext c. Carol intercepts c, decrypts it by sAC to retrieve m, encrypts m by sBC and sends the new ciphertext c′ to Bob. Bob retrieves m by decrypting c′ with his key sBC. The process raises hardly any suspicion in Alice or Bob about the existence of the mediating third party.

In order to avoid this attack, Alice and Bob should each validate the authenticity of the public key of the other party. Public-key certificates can be used to this effect. Unfortunately, using certificates alone may fail to eliminate unknown key-share attacks, as Algorithm 5.30 shows. At the end of this protocol Alice and Bob share a secret s, but Bob believes that he shares it with (the intruder) Carol. Here Carol herself cannot compute the shared secret s (provided that computing discrete logs in G is infeasible). Still there may be situations where this attack can be exploited (see Law et al. [161] for a hypothetical example).

This attack has two potential problems. Under the assumption of intractability of the DLP in G, Carol cannot compute the private key corresponding to the public key eC and so her getting the certificate CertC knowing eC alone may be questioned. Furthermore, replacing (eB, CertB) to ((eB)d, CertB) may make the certificate invalid. If we assume that a certificate authenticates only the entity and not the public key, then these objections can be overruled. In practice, however, a public key certificate should bind the public key to an entity (who can prove the knowledge of the corresponding private key) and so the above attack cannot be easily mounted. Nonetheless, the need for stronger authenticated key-exchange protocols is highlighted by the attack.

Algorithm 5.30. An unknown key-share attack

Alice generates a random and computes eA := gdA.

Alice gets the certificate CertA on eA from the certifying authority.

Alice transmits (eA, CertA) for Bob.

Carol intercepts (eA, CertA).

Carol chooses a random .

Carol gets the certificate CertC on eC := (eA)d from the certifying authority.

Carol sends (eC, CertC) to Bob.

Bob generates a random and computes eB := gdB.

Bob gets the certificate CertB on eB from the certifying authority.

Bob sends (eB, CertB) to Carol.

Carol transmits ((eB)d, CertB) to Alice.

Alice computes s = ((eB)d)dA = gddAdB.

Bob computes s = (eC)dB = ((eA)d)dB = gddAdB.

The Menezes–Qu–Vanstone key-exchange protocol

The Menezes–Qu–Vanstone (MQV) key-exchange protocol is an improved extension of the basic DH scheme, that incorporates public-key authentication. Though the achievement of the desired security goals by the MQV protocol does not seem to be provable, heuristic arguments suggest the effectiveness of the protocol against active adversaries.

Once again, let Alice and Bob be the two parties who plan to agree on a secret element , where the domain parameters G, g, n and m are chosen as in the basic DH scheme. In the MQV scheme, each entity uses two key pairs, one of which ((EA, DA) for Alice and (EB, DB) for Bob) is called the static or the long-term key pair, whereas the other ((eA, dA) for Alice and (eB, dB) for Bob) is called the ephemeral or the short-term key pair. The static key is bound to an entity for a certain period of time and is used in every invocation of the MQV protocol during that period. On the other hand, each entity generates and uses a new ephemeral key pair during each invocation of the protocol. The static key of an entity is assumed to be authentic, say, certified by a trusted authority. The ephemeral key, on the other hand, is validated using the static private key.

Assume that there is a (publicly known) function . Let l := ⌊lg m⌋ + 1 denote the bit length of m = ord g. For , let denote the integer . The bit size of is about half of that of m. In particular, (mod m) for all .

In the MQV protocol, Alice and Bob each computes the shared secret s = gσAσB, where and . Here the exponents σA and σB bear the implicit signatures of Alice and Bob, impressed by their respective static private keys. Alice can compute , since she knows the static public key EB and the ephemeral public key eB of Bob. Similarly, Bob can compute from a knowledge of the public keys EA and eA of Alice. We summarize the steps in Algorithm 5.31.

Algorithm 5.31. MQV key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice obtains Bob’s static public key EB.

Bob obtains Alice’s static public key EA.

Alice generates a random integer dA, 2 ≤ dAm – 1, and computes eA := gdA.

Alice sends eA to Bob.

Bob generates a random integer dB, 2 ≤ dBm– 1, and computes eB := gdB.

Bob sends eB to Alice.

Alice computes (mod m).

Alice computes .

Bob computes (mod m).

Bob computes .

if (s = 1) { Return “failure”. }

Each participating entity using the MQV protocol performs three exponentiations in G. Alice computes gdA, and , of which the first and the last ones have exponents O(m). On the other hand, is , so that the middle exponentiation is about twice as fast as a full exponentiation. This performance benefit justifies the use of and instead of eA and eB themselves. It appears that using these half-sized exponents does not affect security. Also note that (mod m), which implies a non-zero contribution of the static key DA in the expression σA. Similarly for σB.

In order to guard against small-subgroup attacks, the MQV algorithm can incorporate the cofactor k := n/m, that is, assuming gcd(k, m) = 1, the shared secret would now be gσAσB or gkσAσB, depending on whether compatibility with the original MQV method is desired or not.

The MQV algorithm can be used in a situation when only one party, say, Alice, is capable of initiating a transmission to the other party (Bob). In that case, Bob’s static key pair is used also as his ephemeral key pair, that is, the secret element shared between Alice and Bob is .

See Raymond and Stiglic [250] to know more about the security issues for the DH key agreement protocol and its variants.

Exercise Set 5.3

5.12Let G be a multiplicative Abelian group of order n and with identity 1, H the subgroup of G generated by an element of order n, k := n/m and gcd(k, m) = 1. Further let a be a non-identity element of G.
  1. Prove that if ak = 1, then aH. (The converse of this statement is not true in general, even when G is cyclic. However, if a is an element of small order dividing k, we obviously have ak = 1.)

  2. Explain how the modified Diffie–Hellman protocol (Algorithm 5.29) prevents an active attack by Bob described in connection with small-subgroup attacks.

5.13Write the MQV key-exchange protocol with cofactor exponentiation.
5.14Provide the details of the Diffie–Hellman key-exchange algorithm based on the XTR representation (Section 5.2.7).

5.4. Digital Signatures

Suppose an entity (Alice) is required to be bound to some electronic data (like messages or documents or keys). This binding is achieved by Alice digitally signing the data in such a way that no party other than Alice would be able to generate the signature. The signature should also be such that any entity can easily verify that it was Alice who generated the signature. Digital signatures can be realized using public-key techniques. The entity (Alice) generating a digital signature is called the signer, whereas anybody who wants to verify a signature is called a verifier.

We have seen in Section 5.2 how the encryption and decryption transforms fe, fd achieve confidentiality of sensitive data. If the set of all possible plaintext messages is the same as the set of all ciphertext messages and if fe and fd are bijective maps on that set, then the sequence of encryption and decryption can be reversed in order to realize a digital signature scheme. In order to sign m, Alice uses her private key d and the transform fd to generate s = fd(m, d). Any party who knows the corresponding public key e can recover m as m = fe(s, e). This is broadly how a signature scheme works. Depending on how the representative m is generated from the message M that Alice wants to sign, signature schemes can be classified in two categories.

Signature scheme with message recovery

In this case, one takes m = M. Verification involves getting back the message M. If M is assumed to be (the encoded version of) some human-readable text, then the recovered M = fe(s, e) will also be human-readable. If s is forged, that is, if a private key d′ ≠ d has been used to generate s′ = fd(m, d′), then verification using Alice’s public key yields m′ = fe(s′, e), and typically m′ ≠ m, since d′ and e are not matching keys. The resulting message m′ will, in general, make little or no sense to a human reader. If m is not a human-readable text, one adds some redundancy to it before signing. A forged signature yields m′ during verification, which, with high probability, is expected not to have this redundancy.

Attractive as it looks, it is not suitable if M is a long message. In that case, it is customary to break M into smaller pieces and sign each piece separately. Since public-key operations are slow, signature generation (and also verification) will be time-consuming, if there are too many pieces to sign (and verify). This difficulty is overcome using the second scheme described now.

Signature scheme with appendix

In this scheme, a short representative m = H(M) of M is first computed.[2] The function H is usually chosen to be a hash function, that is, one which converts bit strings of arbitrary length to bit strings of a fixed length. H is assumed to be a public knowledge, that is, anybody who knows M can compute m. We also assume that H(M) can be computed fast for messages M of practical sizes. Alice uses the decryption transform on m to generate s = fd(m, d). The signature now becomes the pair (M, s). A verifier obtains Alice’s public key e and checks if H(M) = fe(s, e). The signature is taken to be valid if and only if equality holds. If a forger uses a private key d′ ≠ d, she generates a signature (M, s′), s′ = fd(m, d′), on M and a verifier expects with high probability the inequality H(M) ≠ fe(s′, e).

[2] If M is already a short message, one may go for taking m = M. In order to promote uniform treatment, we assume that the function H is always applied for the generation of m. Use of H is also desirable from the point of security considerations (Exercise 5.15).

A kind of forgery is possible on signature schemes with appendix. Assume that Alice creates a valid signature (M, s), s = fd(H(M), d), on a message M. The function H is certainly not injective, since its input space is much bigger (infinite) than its output space (finite). Suppose that Carol finds a message M′ ≠ M with H(M′) = H(M). In that case, the pair (M′, s) is a valid signature of Alice on the message M′, though it is not Alice who has generated it. (Indeed it has been generated without the knowledge of the private key d of Alice.) In order to foil such attacks, the function H should have second pre-image resistance. The first pre-image resistance and collision resistance properties of a hash functions also turn out to be important in the context of digital signatures. See Sections 1.2.6 and A.4 to know about hash functions.

We now describe some specific algorithms for (generating and verifying) digital signatures. Key pairs used for these algorithms are usually identical to those used for encryption algorithms of Section 5.2 and, therefore, we refrain from a duplicate description of the key-generation procedures. We focus our discussion only on signature schemes with appendix.

5.4.1. The RSA Digital Signature Algorithm

As in the RSA encryption scheme of Section 5.2.1, each entity generates an RSA modulus n = pq, which is the product of two distinct large primes p and q. A key pair consists of an encryption exponent e (the public key) and a decryption exponent d (the private key) satisfying ed ≡ 1 (mod φ(n)).

RSA signature generation involves a modular exponentiation in the ring .

Algorithm 5.32. RSA signature generation

Input: A message M to be signed and the signer’s private key (n, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).   /*  is the short representative of M */
s := md (mod n).

Signature generation can be speeded up if the parameters p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are stored (secretly) in the private key. Now, one can use Algorithm 5.4 for signature generation.

The verification routine also involves a modular exponentiation in .

Algorithm 5.33. RSA signature verification

Input: A signature (M, s) and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

m := H(M).   /*  is the short representative of M */
 (mod n).
if  { Return “Signature verified”. }
else { Return “Signature not verified”. }

Small values of e speed up RSA signature verification and are not known to make the scheme suffer from some special attacks. So the values of e like 3, 257 and 65,537 are quite recommended.

5.4.2. The Rabin Digital Signature Algorithm

As in the Rabin encryption algorithm, we choose two distinct large primes p and q of nearly equal sizes and take n = pq. The public key is n, whereas the private key is the pair (p, q). The Rabin signature scheme is based on the intractability of computing square roots modulo n in absence of the knowledge of the prime factors p and q of n.

Rabin signature generation involves finding a quadratic residue m modulo n as a representative of the message M and computing a square root of m modulo n.

Algorithm 5.34. Rabin signature generation

Input: A message M to be signed and the signer’s private key (p, q).

Output: The signature (M, s) on M.

Steps:

m := H(M).          /*  is assumed to be a quadratic residue modulo n */

Compute a square root s1 of m modulo p./* Algorithm 3.17 */
Compute a square root s2 of m modulo q./* Algorithm 3.17 */
Compute satisfying ss1 (mod p) and ss2 (mod q)./* CRT */

Verification (Algorithm 5.35) involves a square operation in .

Algorithm 5.35. Rabin signature verification

Input: A signature (M, s) and the signer’s public key n.

Output: Verification status of the signature.

Steps:

m := H(M)./* is a quadratic residue modulo n */

(mod n).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.3. The ElGamal Digital Signature Algorithm

The ElGamal signature algorithm is based on the intractability of computing discrete logarithms in certain groups G. For a general description, we consider an arbitrary (finite Abelian multiplicative) group G of order n. We assume that G is cyclic and that a generator g of G is provided. A key pair is obtained by selecting a random integer (the private key) d, 2 ≤ dn – 1, and then computing gd (the public key). The hash function H is assumed to convert arbitrary bit strings to elements of . We further assume that the elements of G can be identified as bit strings (on which the hash function H can be directly applied). G (together with its representation), g and n are considered to be public knowledge and are not input to the signature generation and verification routines.

ElGamal signatures are generated as in Algorithm 5.36. The appendix consists of a pair .

Algorithm 5.36. ElGamal signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ n – 1.

s := gd.

t := d′–1 (H(M) – dH(s)) (mod n).

The costliest step in the ElGamal signature generation algorithm is the exponentiation gd. Here, G is assumed to be cyclic and the exponent d′ to be O(n). We will shortly see modifications of the ElGamal scheme in which the exponent can be chosen to be much smaller, namely O(r), where r is a suitably large (prime) divisor of n.

In order to forge a signature, Carol can generate a random session key (d′, gd) and obtain s. For the computation of t, she requires the private key d of the signer. Conversely, if t (and d′) are available to Carol, she can easily compute the private key d. Thus, forging an ElGamal signature is equivalent to solving the DLP in G.

Each invocation of the ElGamal signature generation algorithm must use a new session key (d′, gd). If the same session key (d′, gd) is used to generate the signatures (M1, s1, t1) and (M2, s2, t2) on two different messages M1 and M2, then we have (t1t2)d′ ≡ H(M1) – H(M2) (mod n), whence d′ can be computed, provided that gcd(t1t2, n) = 1. If d′ is known, the private key d can be easily computed (see Exercise 5.6 for a similar situation).

ElGamal signature verification is described in Algorithm 5.37. This is based on the observation that for a (valid) ElGamal signature (M, s, t) on a message M we have . This verification calls for three exponentiations in G to full-size exponents. Working in a suitable (cyclic) subgroup of G makes the algorithm more efficient.

Algorithm 5.37. ElGamal signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

a1 := gH(M).

a2 := (gd)H(s)st.

if (a1 = a2) { Return “Signature verified”. }

else { Return “Signature not verified”. }

ElGamal signatures use a congruence of the form AdB + dC (mod n), and verification is done by checking the equality gA = (gd)BsC. Our choice for A, B and C was A = H(M), B = H(s) and C = t. Indeed, any permutation of H(M), H(s) and t are acceptable as A, B, C. These give rise to several variants of the ElGamal scheme. It is also allowed to take as A, B, C any permutation of H(M)H(s), t, 1 or H(M)H(s), H(M)t, 1 or H(M)H(s), H(s)t, 1 or H(M)t, H(s)t, 1. Permutations of H(M)H(t), H(s), 1 or H(M), H(s)t, 1, on the other hand, are known to have security bugs. For any allowed combination of A, B, C, the choices ±A, ±B, ±C are also valid. For some other variants, see Horster et al. [132].

5.4.4. The Schnorr Digital Signature Algorithm

The Schnorr signature scheme is a modification of the ElGamal scheme and is faster than the ElGamal scheme, since it works in a subgroup of G generated by g of small order. We assume that r := ord g is a prime (though it suffices to have ord g possessing a suitably large prime divisor). We suppose further that the elements of G are represented as bit strings and that we have a hash function H that maps bit strings to elements of . A key pair now consists of an integer d (the private key), 2 ≤ dr – 1, and the element gd (the public key).

Schnorr signature generation is described in Algorithm 5.38.

Algorithm 5.38. Schnorr signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, gd), 2 ≤ d′ ≤ r – 1.

s := H(Mgd)./* Here ‖ denotes string concatenation */
t := d′ – ds (mod r). 

Similar to the ElGamal scheme, the most time-consuming step in this routine is the computation of the session public key gd. But now d′ < r and, therefore, Algorithm 5.38 runs faster than Algorithm 5.36. One can easily check that forging a signature of Alice is computationally equivalent to determining Alice’s private key d from her public key gd. The importance of using a new session key pair in each run of Algorithm 5.38 is exactly the same as in the case of ElGamal signatures.

The verification of Schnorr signatures (Algorithm 5.39) is based upon the fact that gt = gd(gd)s. Thus, the knowledge of g, s, t and gd allows one to compute gd and subsequently H(Mgd). The algorithm involves two exponentiations with both the exponents (t and s) being ≤ r. Thus, signature verification is also faster in the Schnorr scheme than in the ElGamal scheme.

Algorithm 5.39. Schnorr signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

u := gt(gd)s.

.

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.5. The Nyberg–Rueppel Digital Signature Algorithm

The Nyberg–Rueppel (NR) signature algorithm is another adaptation of the ElGamal signature scheme and is based on the intractability of solving the DLP in a group G. We assume that ord G = n has a large prime divisor r and that an element of order r is available. Here, a key pair is of the form (d, gd), where the private key d is an integer between 2 and r – 1 (both inclusive) and where the public key gd is an element of 〈g〉. The hash function H converts bit strings to elements of . We also assume the existence of a (publicly known) function .

NR signature generation can be performed as in Algorithm 5.40.

Algorithm 5.40. Nyberg–Rueppel signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, gd), 2 ≤ d′ ≤ r – 1.

s := H(M) + F(gd) (mod r).

t := d′ – ds (mod r).

The only difference between NR signature generation and Schnorr signature generation is the way how s is computed. Therefore, whatever we remarked in connection with the security and the efficiency of the Schnorr scheme applies equally well to the NR scheme. Signature verification is also very analogous, as Algorithm 5.41 explains.

Algorithm 5.41. Nyberg–Rueppel signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

u := gt(gd)s.

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.6. The Digital Signature Algorithm (DSA)

The digital signature algorithm (DSA) has been proposed as a standard by the US National Institute of Standards and Technology (NIST) and later accepted as a Federal Information Processing Standard (FIPS) by the US government. This standard is also known as the digital signature standard (DSS). See the NIST document [220] for a complete description of this standard.

Algorithm 5.42. Generation of DSA primes

Input: An integer λ, 0 ≤ λ ≤ 8.

Output: A prime p of bit length l := 512+64λ such that p – 1 has a prime divisor r of length 160 bits.

Steps:

Let l – 1 = 160n + b, 0 ≤ b < 160.     /* n = (l–1) quot 160, b = (l–1) rem 160. */
while (1) {
   do {
       Choose a random seed σ which is a bit string of length k ≥ 160.
       Compute the bit string u := H(σ) ⊕ H((σ + 1) rem 2k).
       r := u OR 2159 OR 1.    /* Set the most and least significant bits of u */
   } while (r is not a prime).
   i := 0, f := 2.
   while (i < 4096) {
       for j = 0, 1, . . . , n { vj := H((σ + f + j) rem 2k). }
       v := v0 + v12160 + · · · + vn–12160(n–1) + (vn rem 2b)2160n + 2l–1.
                                                     /* v is an integer of bit length exactly l */
       p := v – (v rem 2r) + 1.   /* p – 1 is a multiple of 2r */
       if (p is prime) { Return (pr). }
       i++, f := f + n + 1.
   }
}

DSA is based on the intractability of the DLP in the finite field , where p is a prime of bit length 512+64λ with 0 ≤ λ ≤ 8. The cardinality p–1 of is required to have a prime divisor r of length (exactly) 160 bits. The NIST document [220] specifies a standard method for obtaining such a field , which we describe in Algorithm 5.42. We denote by H the SHA-1 hash function that converts bit strings of arbitrary length to bit strings of length 160. We will identify (often without explicit mention) the bit string a1a2. . . ak of length k with the integer a12k–1 + a22k–2 + · · · + ak–12 + ak.

The DSA prime generation procedure (Algorithm 5.42) starts by selecting the prime divisor r and then tries to find a prime p such that r|(p–1). The outputs of H are utilized as pseudorandomly generated bit strings of length 160.

Once the DSA parameters p and r are available, an element of multiplicative order r can be computed by Algorithm 3.26. Henceforth we assume that p, r and g are public knowledge and need not be supplied as inputs to the signature generation and verification routines. A DSA key pair consists of an integer (the private key) d, 2 ≤ dr – 1, and the element gd (the public key) of .

The DSA signature-generation procedure is given as Algorithm 5.43. One may additionally include a check whether s = 0 or t = 0, and, if so, one should repeat signature generation with another session key. But this, being an extremely rare phenomenon, can be ignored for all practical purposes. Both s and t are elements of and hence are represented as integers between 0 and r – 1.

Algorithm 5.43. DSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ r – 1.

t := d′–1(H(M) + ds) (mod r).

DSA signature verification is described in Algorithm 5.44. For a valid signature (M, s, t) on a message M, the algorithm computes wd′(H(M) + ds)–1 (mod r), w1H(M)w (mod r) and w2sw (mod r). Therefore, gw1 (gd)w2gw1+dw2gw(H(M)+ds)gd′(H(M)+ds)–1 (H(M)+ds)gd (mod p). Reduction modulo r now gives .

Algorithm 5.44. DSA signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t–1 (mod r).

w1 := H(M)w (mod r).

w2 := sw (mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

DSA signature generation performs a single exponentiation and DSA verification does two exponentiations modulo p. All the exponents are positive and ≤ r. Thus, DSA is essentially as fast as the Schnorr scheme or the NR scheme.

*5.4.7. The Elliptic Curve Digital Signature Algorithm (ECDSA)

The ECDSA is the elliptic curve analog of the DSA. Algorithm 5.45 describes the generation of the domain parameters necessary to set up an ECDSA system. One first selects a suitable finite field and takes a random elliptic curve E over . E must be such that the cardinality n of the group has a suitably large prime divisor r. One generates a random point of order r and works in the subgroup 〈P〉 of generated by P. It is assumed that q is either a prime p or a power 2m of 2.

Algorithm 5.45. Generation of ECDSA parameters

Input: A finite field , where q is a prime p or a power 2m of 2.

Output: A set of parameters E, n, r, P for the ECDSA.

Steps:

while (1) {
  Choose a randomly.
  Consider the curve .
  Compute n := ord .
  if (n has a prime divisor r > max(2160)) {
     if (n  (qk – 1) for k = 1, . . . , 20) and (n ≠ q) {
        do {
          Select  randomly.
          P := (n/r)P′.
        } while ().
     }
  }
}

The order n = ord can be computed using the SEA algorithm (for q = p) or the Satoh–FGH algorithm (for q = 2m) described in Section 3.6. The integer n should be factored to check if it has a prime divisor r > max(2160, ). The condition n  (qk – 1) for small values of k is necessary to avoid the MOV attack, whereas the condition nq ensures that the SmartASS attack cannot be mounted. is not necessarily a cyclic group. But, r being a prime, a point must be one of order r.

An ECDSA key pair consists of a private key d (an integer in the range 2 ≤ dr – 1) and the corresponding public key . H denotes the hash function SHA-1 that converts bit strings of arbitrary length to bit strings of length 160. As discussed in connection with DSA, we identify bit strings with integers. We also make an association of elements of with integers in the set {0, 1, . . . , q – 1}. ECDSA signatures can be generated as in Algorithm 5.46. It is necessary to check the conditions s ≠ 0 and t ≠ 0. If these conditions are not both satisfied, one should re-run the procedure with a new session key pair.

Algorithm 5.46. ECDSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, dP), 2 ≤ d′ ≤ r – 1.

/* Let us denote */

s := h (mod r).

t := d–1 (H(M) + ds) (mod r).

ECDSA signature verification is explained in Algorithm 5.47. The correctness of this algorithm can be proved like that of Algorithm 5.44.

Algorithm 5.47. ECDSA signature verification

Input: A signature (M, s, t) and the signer’s public key dP.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t–1 (mod r).

w1 := H(M)w (mod r).

w2 := sw (mod r).

Q := w1P + w2(dP).

if () { Return “Signature not verified”. }

/* Otherwise denote */

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.8. The XTR Signature Algorithm

As discussed in Section 5.2.7, the XTR family of algorithms is an adaptation of other conventional algorithms over finite fields. XTR achieves a speed-up of about three using a clever way of representing elements in certain finite fields. It is no surprise that the DLP-based signature algorithms, described so far, can be given efficient XTR renderings. We explain here XTR–DSA, the XTR version of the digital signature algorithm.

In order to set up an XTR system, we need a prime p ≡ 2 (mod 3). The XTR group G is a subgroup of the multiplicative group and has a prime order q dividing p2p + 1. For compliance with the original version of DSA, one requires q to be of bit length 160. The trace map taking is used to represent an element by the element . Under this representation, arithmetic in G translates to that in . For example, we have seen how exponentiation in G can be efficiently implemented using arithmetic (Algorithm 5.20). The trace Tr(g) of a generator g of G should also be made available for setting up the XTR domain parameters. In Section 5.2.7, we have discussed how a random set of XTR parameters (p, q, Tr(g)) can be computed.

An XTR key comprises a random integer (the private key) and the trace (the public key). Algorithm 5.20 is used to compute Tr(gd) from Tr(g) and d. This algorithm gives Tr(gd–1) and Tr(gd+1) as by-products. For an implementation of XTR–DSA, we require these two elements of . So we assume that the public key consists of the three traces Sd(Tr(g)) = (Tr(gd–1), Tr(gd), . As explained in Lenstra and Verheul [172], the values Tr(gd–1) and Tr(gd+1) can be computed easily from Tr(gd) even when d is unknown, so it suffices to store only Tr(gd) as the public key. But we avoid the details of this computation here and assume that all the three traces are available to the signature verifier.

Algorithm 5.20 provides an efficient way of computing exponentiations in G. For DSA-like signature verification (cf. Algorithm 5.44), one computes products of the form ga(gd)b with d unknown. In the XTR world, this amounts to computing the trace Tr(ga(gd)b) from the knowledge of a, b, Tr(g) and Tr(gd) (or Sd(Tr(g))) but without the knowledge of d. The XTR exponentiation algorithm is as such not applicable in such a situation. We should, therefore, prescribe a method to compute traces of products in G. Doing that requires some mathematics that we mention now without proofs. See Lenstra and Verheul [170] for the missing details.

Let e :=ab–1 (mod q). Then, a + bdb(e + d) (mod q), that is, Tr(ga(gd)b) = Tr(gb(e+d)), that is, it is sufficient to compute Tr(ge+d) from the knowledge of e, Tr(g) and Tr(gd). We treat the 3-tuple Sk(Tr(g)) as a row vector (over ). For , let Mc denote the matrix

Equation 5.9


We take c := Tr(g). It can be shown that det , that is, the matrix MTr(g) is invertible, and we have:

Equation 5.10


Here the superscript t denotes the transpose of a matrix. With these observations, one can write the procedure for computing Tr(ga(gd)b) as in Algorithm 5.48.

Algorithm 5.48. XTR multiplication

Input: a, b, Tr(g) and Sd(Tr(g)) for some unknown d.

Output: Tr(ga(gd)b).

Steps:

Compute e := ab–1 (mod q).
Compute Se(Tr(g)) using Algorithm 5.20 with c := Tr(gand n := e.
Use Equation (5.10) to compute Tr(ge+d).
Use Algorithm 5.20 with c := Tr(ge+dand n := b to compute
    .
Return Tr(gb(e+d)).

XTR–DSA signature generation (Algorithm 5.49) is an obvious adaptation of Algorithm 5.43.

Algorithm 5.49. XTR signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M with s, .

Steps:

do {
  Generate a random .
  Compute Tr(gd).          /* Use Algorithm 5.20 with c := Tr(gand n := d′ */
  Let Tr(gd) = x1α + x2α2.     /* α is defined in Section 5.2.7 to represent  */
  s := x1 + px2 (mod q).
while (s ≠ 0).
t := d–1(H(M) + ds) (mod q).         /* Here H is the hash function SHA-1 */

The bulk of the time taken by Algorithm 5.43 goes for the computation of Tr(gd). Since the trace representation of XTR makes this exponentiation three times as efficient as the corresponding DSA exponentiation, XTR–DSA signature generation runs nearly three times as fast as DSA signature generation.

XTR–DSA signature verification can be easily translated from Algorithm 5.44 and is shown in Algorithm 5.50. The most costly step in the XTR–DSA verification routine is the computation of Tr(gw1 (gd)w2). One uses Algorithm 5.48 for this purpose. This algorithm, in turn, invokes the exponentiation Algorithm 5.20 twice. For the original DSA signature verification (Algorithm 5.44), the costliest step is the computation of gw1 (gd)w2, which involves two exponentiations and a (cheap) multiplication. A careful analysis shows that XTR–DSA signature verification runs nearly 1.75 times faster than DSA verification.

Algorithm 5.50. XTR signature verification

Input: XTR–DSA signature (M, s, t) on a message M and the signer’s public key (Tr(gd–1), Tr(gd), Tr(gd+1)).

Output: Verification status of the signature.

Steps:

if or { Return “Signature not verified”. }

w := t–1 (mod q).

w1 := H(M)w (mod q).

w2 := sw (mod q).

Compute Tr(gw1 (gd)w2)./* Use Algorithm 5.48 */
Write this trace value as ./* See Section 5.2.7 */

(mod q).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.9. The NTRUSign Algorithm

The NTRU Signature Scheme (NSS) (Hoffstein et al. [131]) is an adaptation of the NTRU encryption algorithm discussed in Section 5.2.8. Cryptanalytic studies (Gentry et al. [110]) show that the NSS has security flaws. A newer version of the NSS, referred to as NTRUSign and resistant to these attacks, has been proposed by Hoffstein et al. [128]. In this section, we provide a brief overview of NTRUSign.

In order to set up the domain parameters for NTRUSign, we start with an and consider the ring . Elements of R are polynomials with integer coefficients and of degrees ≤ n – 1. The multiplication of R is denoted by ⊛, which is essentially the multiplication of two polynomials of followed by setting Xn = 1. We also fix a positive integer β to be used as a modulus for the coefficients of the polynomials in R. The subsets and of R are of importance for the NTRUSign algorithm, where for one defines , and where νf and νg are suitably chosen parameters. The message space is assumed to consist of pairs of polynomials of R with coefficients reduced modulo β. We further assume that we have at our disposal a hash function H that maps messages (that is, binary strings) to elements of .

Let . The average of the coefficients of a is denoted by . The centred norma‖ of a is defined by

For two polynomials a, , one also defines

‖(a, b)‖2 := ‖a2 + ‖b2.

The parameters νf and νg should be so chosen that any polynomial and any polynomial have (centred) norms on the order O(n). An upper bound B on the norms (of pairs of polynomials) should also be predetermined.

Typical values for NTRUSign parameters are

(n, β, νf, νg, B) = (251, 128, 73, 71, 300).

It is estimated that these choices lead to a security level at least as high as in an RSA scheme with a 1024-bit modulus. For very long-term security, one may go for (n, β) = (503, 256).

In order to set up a key pair, the signer first chooses two random polynomials and . The polynomial f should be invertible modulo β and the signer computes with the property that fβf ≡ 1 (mod β). The public key of the signer is the polynomial hfβg (mod β), whereas the private key is the tuple (f, g, F, G), where F and G are two polynomials in R satisfying

fGgF = qandF‖, ‖G‖ = O(n).

Hoffstein et al. [128] present an algorithm to compute F and G with ‖F‖, from polynomials f and g with ‖f‖, , where c is a given constant.

Algorithm 5.51. NTRU signature generation

Input: A message M to be signed and the signer’s private key (f, g, F, G).

Output: The signature (M, s) on M.

Steps:

Compute .

Compute polynomials A, B, a, satisfying

Gm1Fm2=A + βB,
gm1 + fm2=a + βb,

where a and A have coefficients in the range between –β/2 and +β/2.

Compute sfB + Fb (mod β).

NTRUSign signature generation is described in Algorithm 5.51. It is apparent that the NTRUSign algorithm derives its security from the difficulty in computing a vector v in a certain lattice, close to the vector defined by the hashed message (m1, m2). For defining the lattice, we first note that a polynomial can be identified as a vector (u0, u1, . . . , un–1) of dimension n defined by its coefficients. Similarly, two polynomials u, define a vector, denoted by (u, v), of dimension 2n. To the public key h we associate the 2n-dimensional lattice

It is clear from the definitions that both (f, g) and (F, G) are in Lh.

If h = (h0, h1, . . . , hn–1), then for each i = 0, 1, . . . , n – 1 we have

Xih(X)(hni, . . . , hn–1, h0, . . . , hni–1) (mod β) and
0 ⊛ h(X)βXi (mod β).

It follows immediately that Lh is generated by the rows of the matrix

Now, consider the signature generation routine (Algorithm 5.51). The hash function H generates from the message M a random 2n-dimensional vector m := (m1, m2) not necessarily on Lh. We then look at the vector v := (s, t) defined as:

sfB + Fb (mod β), and
tgB + Gb (mod β).

The lattice Lh has the rotational invariance property, namely, if , then (Xiu, Xiv) is also in Lh for all i = 0, 1, . . . , n – 1. More generally, if , then for any polynomial . In particular, since v = (s, t) = B ⊛ (f, g) + b ⊛ (F, G) (mod β) and since (f, g), , it follows that . Of these two polynomials only s is needed for the generation of NTRUSign signatures. The other is needed during signature verification and can be computed easily from s using the formula ths (mod β), the validity of which is established from the definition of the lattice Lh.

The vector is close to the message vector m in the sense that

for the constant c chosen earlier (see Hoffstein et al. [128] for a proof of this relation). The verification routine can, therefore, be designed as in Algorithm 5.52.

Algorithm 5.52. NTRU signature verification

Input: A signature (M, s) and the signer’s public key h.

Output: Verification status of the signature.

Steps:

Compute .

Compute ths (mod β).

if (‖(m1s, m2t)‖ ≤ B) { Return “Signature verified”. }

else { Return “Signature not verified”. }

For the choice (n, β, c) = (251, 128, 0.45), we have ‖(m1s, m2t)‖ ≈ 216. Therefore, choosing the norm bound B slightly larger than this value (say, B = 300) allows the verification scheme to work correctly most of the time. The knowledge of the private key (f, g, F, G) allows the legitimate signer to compute the close vector (s, t) easily. On the other hand, for a forger (who is lacking the private information) fast computation of a vector v′ = (s′, t′) with small norm ‖(m1s′, m2t′)‖ (say ≤ 400 for the above parameter values) seems to be an intractable task. This is precisely why forging an NTRUSign signature is considered infeasible.

An exhaustive search can be mounted for generating a valid signature (s′, t′) on a message M with H(M) = (m1, m2). More precisely, a forger fixes half of the 2n coefficients of the polynomials s′ and t′ and then tries to solve t′ ≡ hs′ (mod β) for the remaining half such that the norm ‖(m1s′, m2t′)‖ is small. It is estimated (see Hoffstein et al. [128] for the details) that the probability that a random guess for the unknown half succeeds is very low (≤ 2–178.44 for the given parameter values).

Another attack on the NTRUSign scheme is to determine the polynomials f, g from a knowledge of h. Since (f, g) is a short non-zero vector in the lattice Lh, an algorithm that can find such vectors can determine (f, g) (or a rotated version of it). However, for a proper choice of the parameters such an algorithm is deemed infeasible. (Also see the NTRU encryption scheme in Section 5.2.8.)

Similar to the NTRU encryption scheme, the NTRUSign scheme is fast, namely, both signature generation and verification can be carried out in time O(n2). This is one of the main reasons why the NTRUSign scheme deserves popularity. Indeed, it may be adopted as an IEEE standard. Unfortunately, however, several attacks on NTRUSign are known. Gentry and Szydlo [111] indicate the possibility of extending the attacks of Gentry et al. [110]. Nguyen [217] proposes a more concrete attack on NTRUSign, that is capable of recovering the private key from only 400 signatures. The future of NTRUSign and its modifications remains uncertain.

5.4.10. Blind Signature Schemes

Suppose that an entity (Alice) referred to as the sender or the user, wants to get a message M signed by a second entity (Bob) called the signer, without revealing M to Bob. This can be achieved as follows. First Alice transforms the message M to and sends to Bob. Bob generates the signature (, σ) on and sends this pair back to Alice. Finally, Alice applies a second transform g to generate the signature of Bob on M. The transform f hides the actual message M from Bob and, thereby, disallows Bob from associating Alice with the signed message (M, s). Such a signature scheme is called a blind signature scheme.

Blind signatures are widely used in electronic payment systems in which Alice (a customer) wants the signature of Bob (the bank) on an electronic coin, but does not want the bank to be capable of associating Alice with the coin. In this way, Alice achieves anonymity while spending an electronic coin.

In a blind signature scheme, Bob does not know M, but his signature on is essential for Alice to reconstruct the signature on M. Furthermore, the blind signature on M should not allow Alice to compute the blind signature on another message M′. More generally, Alice should not be able to generate l + 1 (or more) blind signatures with only l (or fewer) interactions with Bob. A forgery of this kind is often called an (l, l + 1) forgery or a one-more forgery (in case l is bounded above by a polynomial in the security parameter) or a strong one-more forgery (in case l is bounded above poly-logarithmically in the security parameter). An (l, l + 1) forgery is mountable on a scheme which is not existentially unforgeable (Exercises 5.15 and 5.19). Usually, existential forgery gives forged signatures on messages over which the forger has no (or little) control (that is, on messages which are likely to be meaningless).

Now, we describe some common blind signature schemes. We provide a brief overview of the algorithms. Detailed analysis of the security of these schemes can be found in the references cited at the end of this chapter.

Chaum’s RSA blind signature protocol

Chaum’s blind signature protocol is based on the intractability of the RSAP (or the IFP). The signer generates two (distinct) large random primes p and q and computes n := pq. He then chooses a random integer e with gcd(e, φ(n)) = 1 and computes an integer d such that ed ≡ 1 (mod φ(n)). The public key (of the signer) is the pair (n, e), whereas the private key is d. Chaum’s protocol works as in Algorithm 5.53.

Algorithm 5.53. Chaum’s RSA blind signature

Input: A message M generated by Alice.

Output: Bob’s blind RSA signature (M, s) on M.

Steps:

Alice hashes the message M to .

Alice chooses a random and computes .

Alice sends to Bob.

Bob generates the signature on .

Bob sends σ to Alice.

Alice computes Bob’s (blind) signature s := ρ–1σ (mod n) on M.

Since σ ≡ (ρem)dρmd (mod n), we have sρ–1σmd (mod n), that is, s is indeed the RSA signature of Bob on M. Bob receives and gains no idea about m, since ρ is randomly and secretly chosen by Alice.

The Schnorr blind signature protocol

Let G be a finite multiplicative Abelian group and let be of order r (a large prime). We assume that computing discrete logarithms in G is an infeasible task. The key pair of the signer is denoted by (d, gd), where the integer d, 2 ≤ dr – 1, is the private key and gd the public key. The Schnorr blind signature protocol is described in Algorithm 5.54.

Algorithm 5.54. Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind Schnorr signature (M, s, t) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses a random and computes .

Bob sends to Alice.

Alice selects α, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes .

Bob sends to Alice.

Alice computes .

It is easy to check that the output (M, s, t) of Algorithm 5.54 is a valid Schnorr signature of Bob on the message M. The session key d′ (Algorithm 5.38) for this signature is . Since d and are secret knowledges of Bob, Alice must depend on Bob for the computation of . The message M is never sent to Bob. Also its hash is masked by β. This is how this protocol achieves blindness.

The Okamoto–Schnorr blind signature protocol

Okamoto’s adaptation of the Schnorr scheme is proved to be resistant to an attack by a third entity (Pointcheval and Stern [237]). As in the Schnorr scheme, we fix a (finite multiplicative Abelian) group G (in which it is difficult to compute discrete logarithms). We then choose two elements g1, of (large prime) order r. The private key of the signer now comprises a pair (d1, d2) of integers in {2, . . . , r – 1}, whereas the public key y is the group element . We assume that there is a hash function H whose outputs are in . We identify elements of G as bit strings. The Okamoto–Schnorr blind signature protocol is explained in Algorithm 5.55.

Algorithm 5.55. Okamoto–Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind signature (M, s1, s2, s3) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses random and computes .

Bob sends to Alice.

Alice selects α, β, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes and .

Bob sends and to Alice.

Alice computes and .

An Okamoto–Schnorr signature (M, s1, s2, s3) on a message can be verified by checking the equality s1 = H(Mu), where . Each invocation of the protocol uses a session private key . Alice must depend on Bob for generating s2 and s3, because she is unaware of the private values d1, d2, and . Alice, in an attempt to forge Bob’s blind signature, may start with random and of her choice. But she still needs the integers d1 and d2 in order to complete the protocol. The blindness of Algorithm 5.55 stems from the fact that the message M is never sent to Bob and its hash is masked by γ.

5.4.11. Undeniable Signature Schemes

So far we have seen signature schemes for which any entity with a knowledge of the signer’s public key can verify the authenticity of a signature. There are, however, situations where an active participation of the signer is necessary for the verification of a signature. Moreover, during a verification interaction a signer should not be allowed to deny a legitimate signature made by him. A signature meeting these requirements is called an undeniable signature.

Undeniable signatures are typically used for messages that are too confidential or private to be given unlimited verification facility. In case of a dispute, an entity should be capable of proving a forged signature to be so and at the same time must accept the binding to his own valid signatures. So in addition to the signature generation and verification protocols, an undeniable signature scheme comes with a denial or disavowal protocol to guard against a cheating signer that is unwilling to accept his valid signature either by not taking part in the verification interaction or by responding incorrectly or by claiming a valid signature to be forged.

There are applications where undeniable signatures are useful. For example, a software vendor can use undeniable signatures to prove the authenticity of its products only to its (paying) customers (and not to everybody).

Chaum and van Antwerpen gave a first concrete realization of an undeniable signature scheme [52, 51]. It is based on the intractability of computing discrete logs in the group , p a prime. Gennaro et al. [109] later adapted the algorithm to design an RSA-based undeniable signature scheme. We now describe these two schemes. Rigorous studies of these schemes can be found in the original papers. See also [53, 186, 187, 102, 202, 230].

The Chaum–Van Antwerpen undeniable signature scheme

For setting up the domain parameters for Chaum–Van Antwerpen (CvA) signatures, Bob chooses a (large) prime p of the form p = 2r + 1, where r is also a prime. (Such a prime p is called a safe prime (Definition 3.5).) Bob finds a random element of multiplicative order r, selects a random integer and computes y := gd (mod p). Bob publishes (p, g, y) as his public key and keeps the integer d secret as his private key. The value d–1 (mod r) is needed during verification and can be precomputed and stored (secretly) along with d. We assume that we have a hash function H that maps messages (that is, bit strings) to elements of the subgroup of order r in . In order to generate a CvA signature on a message M, Bob carries out the steps given in Algorithm 5.56. Verification of Bob’s CvA signature by Alice involves the interaction given in Algorithm 5.57.

Algorithm 5.56. Chaum–Van Antwerpen undeniable signature generation

Input: The message M to be signed and the signer’s private key (p, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).

s := md (mod p).

If (M, s) is a valid CvA signature, then

v ≡ (siyj)d–1 (mod r) ≡ ((md)i(gd)j)d–1 (mod r)migjv′ (mod p).

On the other hand, if smd (mod p), Bob can guess the element v′ with a probability of only 1/r, even under the assumption that Bob has unbounded computing resources. This means that unless the signature (M, s) is valid, it is extremely unlikely that Bob can make Alice accept the signature.

The denial protocol for the CvA scheme involves an interaction between the prover Bob and the verifier Alice, as given in Algorithm 5.58. In order to see how this denial protocol works, we note that Algorithm 5.58 essentially makes two calls of the verification protocol. First assume that Bob executes the protocol honestly, that is, Bob follows the steps as indicated. If the signature (M, s) is a valid one, the check v1mi1 gj1 (mod p) (as well as the check v2mi2 gj2 (mod p)) should succeed and Alice’s decision to accept the signature as valid is justified. On the other hand, if (M, s) is a forged signature, that is, if smd (mod p), then the probability that each of these checks succeeds is 1/r as discussed before. Thus, it is extremely unlikely that a forged signature is accepted as valid by Alice. So Alice eventually computes both w1 and w2 equal to si1 i2d–1 (mod r) (mod p) and accepts the signature to be forged. Finally, suppose that Bob is intending to deny the (purported) signature (M, s). If Bob does not fully take part in the interaction, then his intention becomes clear. Otherwise, he sends v1 and/or v2 not computed according to the formulas specified. In that case, Bob succeeds in making Alice compute w1 = w2 with a probability of only 1/r. Thus, it is extremely unlikely that Bob executing this protocol dishonestly can successfully disavow a valid signature.

Algorithm 5.57. Chaum–Van Antwerpen undeniable signature verification

Input: A CvA signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i, .

Alice computes u := siyj (mod p).

Alice sends u to Bob.

Bob computes v := ud–1 (mod r) (mod p).

Bob sends v to Alice.

Alice computes v′ := migj (mod p).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.58. Chaum–Van Antwerpen undeniable signature: denial protocol

Input: A (purported) CvA signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

  1. The signature is valid.

  2. The signature is forged.

  3. Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i1, .

Alice computes u1 := si1 yj1 (mod p) and sends u1 to Bob.

Bob computes (mod p) and sends v1 to Alice.

if (v1 ≡ mi1 gj1 (mod p)) {
   Alice accepts the signature (Msto be valid and quits the protocol.
}

Alice chooses two other secret random integers i2, .

Alice computes u2 := si2 yj2 (mod p) and sends u2 to Bob.

Bob computes and sends v2 to Alice.

if (v2 ≡ mi2 gj2 (mod p)) {
   Alice concludes the signature (Msto be valid and quits the protocol.
}

Alice computes w1 := (v1gj1)i2 (mod p) and w2 := (v2gj2)i1 (mod p).

if (w1 = w2) {
   Alice concludes that the signature is forged.
else {
   Alice concludes that Bob is trying to deny the signature.
}

RSA-based undeniable signature scheme

Gennaro, Krawczyk and Rabin’s undeniable signature scheme (the GKR scheme) is based on the (intractability of the) RSA problem.

A GKR key pair differs from a usual RSA key pair. The signer chooses two (large) random primes p and q such that both p′ := (p – 1)/2 and q′ := (q – 1)/2 are also prime, and sets n := pq. Two integers e and d satisfying ed ≡ 1 (mod φ(n)) are then selected. Finally, one requires a , g ≠ 1, and ygd (mod n). The public key of the signer is the tuple (n, g, y), whereas the private key is the pair (e, d). It can be shown that g need not be a random element of . Choosing a (fixed) small value of g (for example, g = 2) does not affect the security of the GKR protocol, but makes certain operations (computing powers of g) efficient.

Algorithm 5.59. GKR RSA undeniable signature generation

Input: The message M to be signed and the signer’s private key (e, d).

Output: The signature (M, s) on M.

Steps:

m := H(M)./* Hash the message M to an element m of */
s := md (mod n). 

GKR signature generation (Algorithm 5.59) is the same as in RSA. The verification protocol described in Algorithm 5.60 accepts, in addition to a valid GKR signature (M, s), the signatures (M, αs), where has multiplicative order 1 or 2 (there are four such values of α). In view of this, we define the subset

of . Any element is considered to be a valid signature on M. Since Bob knows p and q, he can easily find out all the elements α of of order ≤ 2 and can choose to output (M, αH(M)d) as the GKR signature for any such α. Taking α = 1 (as in Algorithm 5.59) is the canonical choice, but during the execution of the denial protocol Bob will not be allowed to disavow other valid choices.

The interaction between the prover Bob and the verifier Alice during GKR signature verification is given in Algorithm 5.60. It is easy to see that if (M, s) is a valid GKR signature, then v = v′. On the other hand, if (M, s) is a forged signature, that is, if s ∉ Sig M, then the equality v = v′ occurs with a probability of , even in the case that the forger has unbounded computational resources.

Algorithm 5.60. GKR RSA undeniable signature verification

Input: A GKR signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses random i, .

Alice computes u := s2iyj (mod n).

Alice sends u to Bob.

Bob computes v := ue (mod n).

Bob sends v to Alice.

Alice computes v′ := m2igj (mod n).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.61. GKR RSA undeniable signature: denial protocol

Input: A (purported) GKR signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

  1. The signature is forged.

  2. Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses random and .

Alice computes w1 := migj (mod n) and w2 := siyj (mod n).

Alice sends (w1, w2) to Bob.

Bob computes m := H(M).

Bob determines such that the following congruence holds:

Equation 5.11


if (no such i′ is found) {    /* This may happen, if Alice has cheated */
   Bob aborts the protocol.
}
Bob sends i′ to Alice.
if (i = i′) {
   Alice concludes that the signature is forged.
else {
   Alice concludes that Bob is trying to deny the signature.
}

The denial protocol for the GKR scheme is described in Algorithm 5.61. This protocol is executed, after verification by Algorithm 5.60 fails. In that case, Alice wants to ascertain whether the signature is actually invalid or Bob has denied his valid signature by incorrectly executing the verification protocol. A small integer k is predetermined for the denial protocol. The prover needs a running time proportional to k, whereas the probability of a successful denial of a valid signature decreases with k. Taking k = O(lg n) gives optimal performance.

In order to see how this protocol prevents Bob from denying a valid signature, first consider the case that (M, s) is a valid GKR signature of Bob. In that case, . On the other hand, se ≡ αemde ≡ αem (mod n). Therefore, for every , one has . Thus, Bob can only guess the secret value of i chosen by Alice and the guess is correct with a probability of 1/k. On the other hand, if (M, s) is a forged signature, Congruence (5.11) holds only for a single i′, that is, for i′ = i (Exercise 5.23). Sending i′ will then convince Alice that the signature is really forged. In both these cases, Congruence (5.11) holds for at least one i′. Failure to detect such an i′ implies that the value(s) of w1 and/or w2 have not been correctly sent by Alice. The protocol should then be aborted.

In order to reduce the probability of successful cheating, it is convenient to repeat the protocol few times instead of increasing k. If k = 1024, Bob can successfully cheat in eight executions of the denial protocol with a probability of only 2–80.

5.4.12. Signcryption

The conventional way to ensure both authentication and confidentiality of a message is to sign the message first and then encrypt the signed message. Now that we have many signature and encryption algorithms in our bag, there is hardly any problem in achieving both the goals simultaneously. Zheng proposes signcryption schemes that combine these two operations together. A signcryption scheme is better than a sign-and-encrypt scheme in two aspects. First, the combined primitive takes less running time than the composite primitive comprising signature generation followed by encryption. Second, a signcrypted message is of smaller size than a signed-and-encrypted message. When communication overheads need to be minimized, signcryption proves to be useful.

Before describing the signcryption primitive, let us first review the composite sign-and-encrypt scheme. Let M be the message to be sent. Alice the sender generates the signature appendix s on M using one of the signature schemes described earlier. This step can be described as s = fs(M, da), where da is the private key of Alice. Next a symmetric key k is generated by Alice. The message M is encrypted by a symmetric cipher (like DES) under the key k, that is, C := E(M, k). The key k is then encrypted using an asymmetric routine under the public-key eb of Bob the recipient, that is, c = fe(k, eb). The triple (C, c, s) is then transmitted to Bob.

Upon reception of (C, c, s) Bob first retrieves k using his private key db, that is, k = fd(c, db). The message M is then recovered by symmetric decryption: M = D(C, k). Finally, the authenticity of M is verified from the signature using the verification operation: fv(M, s, ea), where ea is the public key of Alice. Algorithm 5.62 describes the sign-and-encrypt operation and its inverse.

Algorithm 5.62. Sign-and-encrypt

s := fs(M, da).

Generate a random symmetric key k.

c := fe(k, eb).

C := E(M, k).

Send (C, c, s) to the recipient.

Decrypt-and-verify

k := fd(c, db).

M := D(C, k).

Verify the signature: fv(M, s, ea).

Zheng’s signcryption scheme combines fs and fe to a single operation fse and also fd and fv to another single operation fdv. Each of these combined operations essentially takes the time of a single public- or private-key operation and hence leads to a performance enhancement by a factor of nearly two. Moreover, the encrypted key c need not be sent with the message, that is, C and s are sufficient for both authentication and confidentiality. This reduces communication overhead.

Signcryption is based on shortened digital signature schemes. Table 5.3 describes the shortened version of DSA (Section 5.4.6). We use the notations of Algorithms 5.43 and 5.44. Also ‖ denotes concatenation of strings, and H is a hash function (like SHA-1). The shortened schemes have two advantages over the original DSA. First, a DSA signature is of length 2|r|, whereas an SDSA1 or SDSA2 signature has length |r| + |H(·)|. For the current version of the standard, both r and H(·) are of size 160 bits. However, one may use potentially bigger r and in that case the shortened schemes give smaller signatures with equivalent security. Finally, DSA requires computing a modular inverse during verification, whereas SDSA does not. So verification is more efficient in the shortened schemes.

Table 5.3. Shortened digital signature algorithms
NameSignature generationSignature verification
SDSA1s := H(gd (mod p)‖M). t := d′(s + d)–1 (mod r).w := (eags)t (mod p). Verify if s = H(wM).
SDSA2s := H(gd (mod p)‖M). t := d′(1 + ds)–1 (mod r).. Verify if s = H(wM).

Algorithms 5.63 and 5.64 provide the details of the signcryption algorithm and its inverse called unsigncryption. The algorithms use a keyed hash function KH. One may implement KH(x, ) as using an unkeyed hash function H.

Signcryption differs from the shortened scheme in that is used instead of gd for the computation of s. The running time of the signcryption algorithm is dominated by this modular exponentiation. When signature and encryption are used separately, the encryption operation uses one (or more) exponentiations. So signcryption significantly improves upon the sign-and-encrypt scheme of Algorithm 5.62.

Algorithm 5.63. Signcryption

Input: Plaintext message M, the sender’s private key da, the recipient’s public key

eb = gdb (mod p).

Output: The signcrypted message (C, s, t).

Steps:

Select a random .
.                /* Generate keys for both signing and encrypting. */
Write k := k1 ‖ k2 with |k2equal to the length of a symmetric key.
s := KH(MNk1).
                 /* Here N is the public key or the public key certificate of the sender. */

C := E(Mk2).                                                          /* Symmetric encryption */

Algorithm 5.64. Unsigncryption

Input: The signcrypted message (C, s, t), the sender’s public key ea = gda (mod p) and the recipient’s private key db.

Output: The plaintext message M and the verification status of the signature.

Steps:

Write k := k1k2 with |k2| equal to the length of a symmetric key.

M := D(C, k2)./* Symmetric decryption */

if (KH(MN, k1) = s) { Return “Signature verified”. }

else { Return “Signature not verified”. }

The most time-consuming part of unsigncryption is the computation of two modular exponentiations. DSA verification too has this property. However, an additional decryption in the decrypt-and-verify scheme of Algorithm 5.62 calls for one (or more) exponetiations, making it slower that unsigncryption.

Exercise Set 5.4

5.15
  1. Show how first pre-image resistance of the hash function H plays an important role for RSA signatures (with appendix) described in Section 5.4.1. More precisely, show that if it is easy to find a pre-image of any hash value, it is easy to generate a valid signature (M, s) from two valid signatures (M1, s1) and (M2, s2) with M ∉ {M1, M2}. This is often referred to as existential forgery of a signature. [H]

  2. Describe how existential forgery is possible for the Rabin signature scheme. [H]

  3. Describe how existential forgery is possible for the ElGamal signature scheme. [H]

5.16Assume that Bob uses the same RSA key pair ((n, e), d) for receiving encrypted messages and for signing. Suppose that Carol intercepts the ciphertext cme (mod n) sent by Alice. Also suppose that Bob is willing to sign any random message presented by Carol. Explain how Carol can choose a message to be signed by Bob in order to retrieve the secret m. [H]
5.17Let G be a finite cyclic group of order n, and g a generator of G. Suppose that Alice’s private and public keys are respectively d and gd.
  1. Consider a variant of the ElGamal signature scheme, in which s is computed as in Algorithm 5.36, but the roles of d and d′ are interchanged in the generation of t, that is, the modified signature (s, ) on M is generated as:

    s:=gd′,
    :=d–1[H(M) – dH(s)] (mod n).

    Write the verification routine for the modified scheme.

  2. Show that forging modified ElGamal signatures is as difficult as computing discrete logarithms in G. You may assume that a forger can arrange d′ of her choice.

  3. Explain why signature generation is (a bit) more efficient in the modified scheme. Suppose that because of this enhanced performance Alice decided to switch to the modified scheme, but for backward compatibility she maintained both the original signature (s, t) and the modified signature (s, ) on a message M. What went wrong?

5.18Show that:
  1. There are two valid ECDSA signatures on each message.

  2. There are three valid XTR–DSA signatures on each message.

(Here we call a signature valid, if it passes the verification routine.)

5.19
  1. Write the versions with message recovery of the RSA, Rabin, Schnorr and Nyberg–Rueppel signature schemes.

  2. Describe the possibilities of existential forgery for these versions. (Since hash functions cannot be inverted, they are not used for signature schemes with message recovery, and so the problem of existential forgery is more acute in this case. To avoid such forgeries the signer should add some redundancy to each message block before signing the same. An existentially forged signature is likely to correspond to a message not containing the redundancy.)

5.20Design the XTR version of the Nyberg–Rueppel signature scheme with appendix (Section 5.4.5). What are the speed-ups achieved by the signature generation and verification routines of the XTR version over the original NR routines?
5.21Repeat Exercise 5.20 with the Schnorr digital signature scheme (Section 5.4.4).
5.22
  1. Deduce that the determinant of the matrix Mc of Equation (5.9) is

  2. Demonstrate that

5.23Let p, q, p′, q′ be distinct odd primes with p = 2p′ + 1 and q = 2q′ + 1, and let n := pq (as in the RSA-based undeniable signature scheme).
  1. Let . Show that . [H]

  2. Argue that there are exactly four elements in of order ≤ 2.

  3. Let α ≢ ±1 (mod n) and ordn α < pq′. Show that gcd(α – 1, n) or gcd(α + 1, n) is a non-trivial divisor of n. How many such elements α does contain?

  4. Let have order pq′ or 2pq′. Show that for every .

  5. Look at the denial protocol for the GKR RSA signature scheme (Algorithm 5.61) and assume that p′ < q′. Suppose that (M, s) is a forged signature (that is, s ∉ Sig M) on some message M with . Show that s ≡ αmd (mod n) for some with ordn α ≥ p′. Deduce that ordn(mse) ≥ p′. Conclude that if 4k < p′, then there exists a unique (namely, i′ = i) for which Congruence (5.11) holds.

5.24
  1. Write the shortened versions of ECDSA signature generation and verification.

  2. Write the signcryption and unsigncryption algorithms based on shortened ECDSA.

5.5. Entity Authentication

Entity authentication (also called identification) is a process by means of which an entity Alice, called the claimant, proves her identity to another entity Bob, called the prover or the verifier. Alice is assumed to possess some secret piece(s) of information that no intruder is expected to know. During the execution of the identification protocol, an interaction takes place between Alice and Bob. If the interaction allows Bob to conclude (deterministically or with high probability) that the claimer possesses the secret knowledge, he accepts the claimer as Alice. An intruder Carol lacking the secret information is expected (with high probability) to fail to convince Bob of her identity as Alice. This is how entity authentication schemes tend to prevent impersonation attacks by intruders. Typically, identification schemes are used to protect access to some sensitive piece(s) of data, like a user’s (or a group’s) private files in a computer or an account in a bank. Both secret-key and public-key techniques are used for the realization of entity authentication protocols.

5.5.1. Passwords

A password is a small string to be remembered by an entity and produced verbatim to the verifier at the time of identification. The most common example is a computer password used to protect access to a user’s private working area in a file system. In this case, an alphanumeric string (or a string that can be input using a computer keyboard) of length between 4 and 20 characters is normally used as the secret information associated with an entity. Passwords are also used to prevent misuse of certain physical objects (like an ATM card for withdrawing cash from one’s bank account, a prepaid telephone card) by anybody other than the legitimate owners of the objects. In this case, a password usually consists of a sequence of four to ten digits and is also called a personal identification number or a PIN.

In order that Bob can recognize an entity from her password, a possibility for Bob is to store the (entity, password) pairs corresponding to all the entities that are expected to participate in identification interactions with Bob. When Alice enters her password, Bob checks if Alice’s input is the same as what he stores in the pair for Alice. The file(s) storing these private records should be preserved with high secrecy, and neither read nor write access should be granted to any user. But a privileged user (the superuser) is usually given the capability to inspect any file (even read-protected ones) and, therefore, can make misuse of the passwords.

This problem can be avoided by storing, instead of the passwords themselves, a one-way transform of the passwords.[3] When Alice enters a password P, Bob computes the transform f(P) and compares f(P) with the record stored for Alice. The identity of Alice is accepted if and only if a match occurs. The password file now need not be read-protected, since any intruder (even the superuser) knowing the value f(P) cannot easily compute P.

[3] Informally speaking, a one-way function is one which is computationally infeasible to invert.

Passwords should be chosen from a space large enough to preclude exhaustive search by an intruder in feasible time. Unfortunately, however, it is a common tendency for human users to choose passwords from limited subsets of the allowed space. For example, use of lower case characters, dictionary words, popular names, birth dates and so on in passwords makes attacks on passwords much easier. A strategy to foil such dictionary-based attacks is to use a pseudorandom bit sequence S known as the salt and apply the one-way function f to a combination of the password P and the salt S. That is, a function f(P, S) is now stored against an entity Alice having a password P. The combination (P, S) is often referred to as a key for the password scheme. Since a password now corresponds to many possible keys, the search space for an intruder increases dramatically. For instance, if S is a pseudorandomly chosen bit string of length 64, the intruder has to compute f(P, S) for a total of 264 times in order to guess the correct candidates for S for each P under trial. It is also necessary that the same key is not chosen for two different entities. If the salt S is a 64-bit string, then by the birthday paradox a collision between two keys is expected to occur only after (at least) 232 keys are generated.

A second strategy to strengthen the protection of passwords is to increase the so-called iteration count n, that is, instead of storing f(P, S) for each password P, Bob now stores fn(P, S). An n-fold application of the function f increases by a factor of n both the time for password verification and for exhaustive search by an intruder. For a legitimate user, this is not really a nuisance, since computation of fn(P, S) only once during identification is tolerable (and may even be unnoticeable), whereas to an intruder breaking a password simply becomes n times as difficult. In typical applications, values of n ≥ 1000 are recommended.

In some situations, it is advisable to lock access to a password-protected area after a predetermined number of (say, three) wrong passwords have been input in succession. This is typically the case with PINs for which the search space is rather small. For unlocking the access (to the legitimate user Alice), a second longer key (again known only to Alice) is used or human intervention is called for.

As a case study, let us briefly describe the password scheme used by the UNIX operating system. During the creation of a password a user supplies a string P of eight 7-bit ASCII characters as the password. (Longer strings are truncated to first 8 characters.) A 56 bit DES[4] key K is constructed from P. A 12-bit random salt S is obtained from the system clock at the time of the creation of the password. The zero message (that is, a block of 64 zero bits) is then iteratively encrypted n = 25 times using K as the key. The encryption algorithm is a variant of the DES, that depends on the salt S. The output ciphertext and the salt (which account for a total of 64 + 12 = 76 bits) are then packed into eleven 7-bit ASCII characters and stored in the password file (usually /etc/passwd). When UNIX was designed (in 1970), this algorithm, often referred to as the UNIX crypt password algorithm, was considered to be reasonably safe under the assumption of the difficulty of finding a DES key from a plaintext–ciphertext pair. With today’s hardware and software speed, a motivated attacker can break UNIX passwords in very little time.

[4] The data encryption standard (DES) is a well-known symmetric-key cipher (Section A.2.1).

Password-based authentication schemes suffer from the disadvantage that the user has to disclose her secret P to the verifier. The verifier may misuse the knowledge of P by storing it secretly and deploying it afterwards. During the process of computation of fn(P, S) the string P resides in the machine’s memory. An eavesdropper capable of monitoring the temporary storage holding the string P easily gets its value. In view of these shortcomings, password schemes are referred to as weak authentication schemes.

5.5.2. Challenge–Response Algorithms

In a strong authentication scheme, the claimant proves the possession of a secret knowledge to a verifier without disclosing the secret to the verifier. One of the communicating entities generates a random bit string c known as the challenge and sends c (or a function of c) to the other. The latter then reacts to the challenge appropriately, for example, by sending a response string r to the former. Strong authentication schemes are, therefore, also called challenge–response authentication schemes. The communication between the entities depends both on the random challenge and on the secret knowledge of the claimant. An intruder lacking the secret knowledge of a valid claimant cannot take part properly in the interaction. Furthermore, since a random challenge is used during each invocation of the identification protocol, an eavesdropper cannot use the intercepted transcripts of a particular session for a future invocation of the protocol.

Public-key protocols can be used to realize challenge–response schemes. We assume that Alice is the claimant and Bob is the verifier. Without committing to specific algorithms, we denote the public and private keys of Alice by e and d, and the encryption and decryption transforms by fe and fd respectively. Alice proves her identity by demonstrating her knowledge of d (but without revealing d) to Bob. Bob uses the transform fe and Alice the transform fd under the respective keys e and d. If a key d′ other than d is used by Carol in conjunction with e, some step of the interaction detects this and the protocol rejects Carol’s claim to be Alice. We describe two challenge–response schemes that differ in the sequence of applying the transforms fe and fd.

A challenge–response scheme based on encryption–decryption

In this scheme, Bob (the verifier) first generates a random string r, encrypts the same by the public key of Alice (the claimant) and sends the ciphertext c (the challenge) to Alice. Alice uses her private key to decrypt c to the message r′ and sends r′ (the response) back to Bob. Identification of Alice succeeds if and only if r = r′. Algorithm 5.65 illustrates the details of this scheme. It employs a one-way function H (like a hash function) for a reason explained later. This scheme checks whether the claimant can recover the random string r correctly. A knowledge of the decryption key d is needed for that.

Algorithm 5.65. Challenge–response authentication based on encryption

Bob generates a random bit string r and computes w := H(r).

Bob reads Alice’s (authentic) public key e and computes c := fe(r, e).

Bob sends (w, c) to Alice.

Alice computes r′ := fd(c, d).

if (H(r′) ≠ w) { Alice quits the protocol. }

Alice sends rto Bob.

Bob identifies Alice if and only if r′ = r.

The string H(r) = w is called the witness. By sending w to Alice, Bob convinces her of his knowledge about the secret r without disclosing r itself. If Bob (or a third party pretending to be Bob) tries to cheat, Alice has the option to abort the protocol prematurely. In other words, Alice does not have to decrypt an arbitrary ciphertext presented by Bob without confirming that Bob knows the corresponding plaintext.

A challenge–response scheme based on digital signatures

In the scheme explained in Algorithm 5.66, Alice (the claimant) first does the private key operation, that is, Alice sends her digital signature on a message to Bob (the prover). Bob then verifies the signature of Alice by employing the encryption transform with Alice’s public key.

Algorithm 5.66. Challenge–response authentication based on signature

Bob selects a random string rB.

Bob sends rB to Alice.

Alice selects a random string rA.

Alice generates the signature s := fd(rArB, d).

Alice sends (rA, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

This authentication scheme is based on the assumption that only a person knowing Alice’s private key d can generate a signature s that leads to the equalities and . Using only rA and the signature s = fd(rA, d) would demonstrate to Bob that Alice possesses the requisite knowledge of d. The random string rB is used to prevent the so-called replay attack. If rB were not used, an eavesdropper Carol intercepting the transcripts of a session can later claim her identity as Alice by simply supplying rA and Alice’s signature on rA to Bob. Using a new rB in every session (and incorporating it in the signature) guarantees that the signature varies in different sessions, even when rA remains the same.

There is an alternative strategy by which the use of the random string rB can be avoided. All we have to ensure is that a value of rA used once cannot be reused in a subsequent session. This can be achieved by using a timestamp, which is a string reflecting the time when a certain event occurs (in our case, when Alice generates the signature). Thus, if Alice gets the local time tA, computes the signature s := fd(tA, d) and sends (tA, s) to Bob, it is sufficient for Bob to check that the timestamp tA is valid. A possible criterion for the validity of Alice’s timestamp tA is that the difference between tA and the time when Bob is verifying the signature is within an allowed bound (predetermined, based on the approximate time for the communication). But it may be possible for an adversary to provide to Bob the timestamp tA and Alice’s signature on tA, before tA expires. Therefore, Bob should additionally ensure that timestamps from Alice come in a strictly ascending order. Maintaining the timestamp for the last interaction with Alice takes care of this requirement. Algorithm 5.67 describes the modified version of Algorithm 5.66, based on timestamps. A problem with timestamps is that (local) clocks across a network have to be properly synchronized.

Algorithm 5.67. Using timestamp in challenge–response authentication

Alice reads the local time tA.

Alice generates the signature s := fd(tA, d).

Alice sends (tA, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the time-stamp .

Bob identifies Alice if and only if and this timestamp is valid.

Mutual authentication

So far, we have described identification schemes that are unidirectional or unilateral in the sense that only Alice tries to prove her identity to Bob. For mutual authentication between Alice and Bob, the above schemes can be used a second time by reversing the roles of Alice and Bob. Algorithm 5.68 describes an alternative strategy that achieves mutual authentication with reduced communication overhead (compared to two invocations of the unidirectional scheme). Now, the key pairs (eA, dA) and (eB, dB) and the transforms fe, A, fd, A and fe, B, fd, B of both Alice and Bob should be used.

5.5.3. Zero-Knowledge Protocols

The challenge–response schemes described above ensure that the claimant’s secret is not made available to the verifier (or a listener to the communication between the verifier and the claimant). But the claimant uses her private key for generating the response and, therefore, it continues to remain possible that a verifier extracts some partial information on the secret by choosing challenges strategically.

Algorithm 5.68. Mutual authentication

Bob selects a random string rB.

Bob sends rB to Alice.

Alice selects a random string rA.

Alice generates the signature sA := fd, A(rArB, dA).

Alice sends (rA, sA) to Bob.

Bob reads Alice’s (authentic) public key eA.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

Bob generates the signature sB := fd, B(rBrA, dB).

Bob sends sB to Alice.

Alice reads Bob’s (authentic) public key eB.

Alice retrieves the strings and satisfying .

Alice identifies Bob if and only if and .

Using a zero-knowledge (ZK) protocol overcomes this difficulty in the sense that (absolutely) no information on the claimant’s secret is leaked out during the conversation between the claimant and the verifier. The verifier (or a listener) continues to remain as much ignorant of the secret as he was before the invocation of the protocol. In other words, the verifier (or a listener) does not learn anything form the conversation, that he could not learn by himself in absence of the claimant. The only thing the verifier gains is the confidence whether the claimant actually knows the secret or not. This is intuitively the defining feature of a ZK protocol.

Similar to other public-key techniques, the security of the ZK protocols is based on the intractability of some difficult computational problems. A repeated use of a public-key scheme with a given set of parameters may degrade the security of the scheme under those parameters. For example, each encryption of a message (or each generation of a signature) makes available a plaintext–ciphertext pair which may eventually help a cryptanalyst. A ZK protocol, on the other hand, does not lead to such a degradation of the security of the protocol, irrespective of how many times it is invoked.

We stick to the usual scenario: Alice is the claimant, Bob is the verifier and Carol is an eavesdropper trying to impersonate Alice. In the jargon of ZK protocols, Alice (and not Bob) is called the prover. In order to avoid confusions, we continue to use the terms claimant and verifier. A ZK protocol is usually a three-pass interactive protocol. To start with, Alice chooses a random commitment and sends a witness of the commitment to Bob. A new commitment should be selected by Alice during each invocation of the protocol in order to guard against an adversarial verifier. Upon receiving the witness, Bob chooses and sends a random challenge to Alice. Finally, Alice replies by sending a response to the challenge. If Alice knows the secret (and performs the protocol steps correctly), her response can be easily proved by Bob to be valid. Carol, in an attempt to impersonate Alice without knowing the secret, can produce the valid response with a probability P bounded away from 1. If P happens not to be negligibly small, then the protocol can be repeated a sufficient number of times, so that Carol’s probability of giving the correct response on all occasions becomes extremely low.

The parameters and the secrets for a ZK protocol can be set privately by each claimant. Another alternative is that a trusted third party (TTP) generates a set of parameters and makes these parameters available for use by every claimant over a network. A second duty of the TTP is to register a secret against each entity. The secret may be generated either by the TTP or by the respective entity. The knowledge of this (registered) secret by an entity is equivalent to her identity in the network. Finally, the authenticity of the public key of an entity is ensured by the digital signature of the TTP on the public key. For simplicity, however, we will not bother about the existence of the TTP and the way in which the secret (the possession of which by Alice is to be proved) has been created and/or handed over to Alice. We will also assume that each entity’s public key is authentic.

The Feige–Fiat–Shamir (FFS) protocol

The FFS protocol (Algorithm 5.69) is based on the intractability of computing square roots modulo a composite integer n. We take n = pq with two distinct primes p and q each congruent to 3 modulo 4.

Algorithm 5.69. Feige–Fiat–Shamir zero-knowledge protocol

Selection of domain parameters:

Select two large distinct primes p and q each congruent to 3 modulo 4.

n := pq.

Select a small integer t./* The probability of a successful cheat is 2t */

Selection of Alice’s secret:

Alice selects t random integers .

Alice selects t random bits .

Alice computes for i = 1, . . . , t.

Alice makes (y1, . . . , yt) public and keeps (x1, . . . , xt) secret.

The protocol:

Alice randomly chooses and ./* Commitment */
Alice computes and sends to Bob w := (–1)γc2 (mod n)./* Witness */
Bob randomly chooses and sends to Alice ./* Challenge */
Alice computes and sends to Bob ./* Response */

Bob computes (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ ≡ ±w (mod n).

It is clear from Algorithm 5.69 that knowing the secret (x1, . . . , xt) allows Alice to let Bob accept her identity (as Alice). The check w′ ≠ 0 in the last line is necessary to preclude the commitment c = 0, that makes any claimant succeed irrespective of the availability of the knowledge of the secret.

Now, let us see how an opponent (Carol), without knowing the secret, can succeed in impersonating Alice by taking part in this protocol. To start with, we consider the simple case t = 1 (which corresponds to Fiat and Shamir’s original scheme). Carol can start the process by generating a random c and γ and computing w = (–1)γc2. Now, Carol should send the response c or cx1 depending on whether Bob sends ∊1 = 0 or 1. Her capability of sending both correctly is equivalent to her knowledge of x1. If Bob sends ∊1 = 0, then she can provide the correct response c. Otherwise, Carol can at best select a random response from , and the probability that this is correct is overwhelmingly low. On the other hand, let Carol choose a random c and and send the (improper) witness . In that case, Carol can answer the valid response r = c, if Bob’s challenge is ∊1 = 1. Sending the correct response to the challenge ∊1 = 0 now requires knowledge of x1. Therefore, if ∊1 is randomly chosen by Bob (without the prior knowledge of Carol), Carol can successfully respond with probability (very close to) 1/2. For t ≥ 1, this probability of a cheat by Carol can be easily shown to be (very close to) 1/2t which is negligibly small for t ≥ 80.

In practice, however, t is chosen to be O(ln ln n). It is, therefore, necessary to repeat the protocol t′ times, so that the probability of a successful cheat becomes (nearly) 1/2tt. Taking t′ = Θ(ln n) is recommended. It can be shown that these choices for t and t′ offer the FFS protocol the desired ZK property. Without going into a proof of this assertion, let us informally explain the ZK property of the FFS protocol. Neither Bob nor a listener to the conversation between Alice and Bob can get any idea of the secret (x1, . . . , xt). Bob gets as a response the product of c and those xi’s for which ∊i = 1. Since c is randomly chosen by Alice and is not available to Bob, there is no way to choose a strategic challenge. However, if the square root of w (or –w) can be computed by Bob, then the interaction may give away partial information on the secret. For example, if Bob chooses the challenge (∊1, ∊2, . . . , ∊t) = (1, 0, . . . , 0), then Alice’s response would be cx1 from which x1 can be computed by Bob, if he knows c. Thus, the security and the ZK property of the FFS protocol are based on the assumption that computing square roots modulo n is an infeasible computational problem.

The Guillou–Quisquater (GQ) protocol

The GQ identification protocol is based on the intractability of the RSA problem. The correctness of Algorithm 5.70 (for a legitimate claimant) is easy to establish. The check w′ ≠ 0 is necessary to avoid the commitment c = 0, which makes a claimant succeed always.

A TTP typically selects the domain parameters p, q, n, e and d. It also selects m and gives s to Alice without revealing d. The execution of the protocol does not require the use of the decryption exponent d. In fact, d is a global secret, whereas s is Alice’s personal secret. Alice tries to prove the knowledge of s (and not of d).

In the GQ algorithm, the power s is blinded by multiplying it with the random commitment c. As a witness for c, Alice presents its encrypted version w. With the assumption that RSA decryption without the knowledge of the decryption exponent d is infeasible, Bob (or an eavesdropper) cannot compute c and hence cannot separate out the value of s. Thus, no partial information on s is provided. Furthermore, each invocation requires a random ∊. In order to compute a strategic witness, Carol can at best have a guess of ∊. The guess is correct with a probability of 1/e. If e is reasonably large, the probability of a successful cheat is low. However, larger values of e lead to more expensive generation of the witness from the commitment (and also of the response). So small values of e (say, 216 + 1 = 65,537) are usually recommended. In that case, repeating the protocol a suitable number of times makes Carol’s chance of cheating as small as one desires. Taking te (where t′ is the number of iterations of the protocol) of the order of (log n)α for some constant α gives the GQ protocol the desired zero-knowledge property.

Algorithm 5.70. Guillou–Quisquater zero-knowledge protocol

Selection of domain parameters:

Select two distinct large primes p and q and set the modulus n := pq.

Select an exponent and compute d := e–1 (mod φ(n)).

The pair (n, e) is made public and d is kept secret.

Selection of Alice’s secret:

Alice selects a random and computes s := md (mod n).

Alice makes m public and keeps s secret.

The protocol:

Alice selects a random ./* Commitment */
Alice computes and sends to Bob w := ce (mod n)./* Witness */
Bob selects and sends to Alice a random ./* Challenge */
Alice computes and sends to Bob r := cs (mod n)./* Response */

Bob computes w′ := mre (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ = w.

The Schnorr protocol

The Schnorr protocol is based on the intractability of computing discrete logarithms in a large prime field . We assume that a suitably large prime divisor q of p – 1 and an element of multiplicative order q are known. The algorithm works in the subgroup of , generated by g. In order to make the known algorithms for solving the DLP infeasible for the field , one should have q > 2160.

Algorithm 5.71. Schnorr zero-knowledge protocol

Selection of domain parameters:

Select a large prime p such that p – 1 has a large prime divisor q.

Select an element having multiplicative order q modulo p.

Publish (p, q, g).


Select a small integer t < lg q.           /* The probability of a successful cheat is 2t */

Selection of Alice’s secret:

Alice chooses a random secret integer .

Alice computes and makes public the integer y := gd (mod p).

The protocol:

Alice chooses a random ./* Commitment */
Alice computes and sends to Bob w := gc (mod p)./* Witness */
Bob selects and sends to Alice a random ./* Challenge */
Alice computes and sends to Bob r := d∊ + c (mod q)./* Response */

Bob computes w′ := gry (mod p).

Bob accepts Alice’s identity if and only if w′ = w.

We leave the analysis of correctness and security of this protocol to the reader. The secret s is masked from Bob and other eavesdroppers by introducing the random additive bias c modulo q. The probability of a successful cheat by an adversary is 2t, since ∊ is chosen randomly from a set of cardinality 2t. Usually the Schnorr protocol is not used iteratively. Therefore, t ≥ 40 is recommended for making the probability of cheating negligible. On the other hand, if t is too large, then the protocol can be shown to lose the ZK property. For the generation of the witness from the commitment, Alice computes a modular exponentiation to an exponent which is O(q). Generating the response, on the other hand, involves a single multiplication (and a single addition) modulo q and hence is very fast.

Exercise Set 5.5

5.25
  1. Describe how a zero-knowledge witness–challenge–response identification scheme can be converted to a signature scheme. [H]

  2. Write the Feige–Fiat–Shamir, Guillou–Quisquater and Schnorr signature schemes based on the corresponding identification schemes.

5.26Let n := pq with distinct primes p and q each congruent to 3 modulo 4.
  1. Show that –1 is a quadratic non-residue modulo p and modulo q.

  2. If is a quadratic residue modulo n, prove that a has exactly four square roots modulo n, of which exactly one is a quadratic residue modulo n.

  3. Consider the following identification protocol in which Alice wants to prove to Bob her knowledge of the factorization of n = pq. Assume that p and q are sufficiently large so that computing square roots modulo n is infeasible without the knowledge of the factorization of n. Argue that Alice can prove her identity to Bob if and only if she knows the factorization of n.

    A bad zero-knowledge protocol

    Bob chooses a random and computes a := x4 (mod n).

    Bob sends a to Alice.

    Alice computes four square roots of a modulo n and picks up the unique
           square root b which is a quadratic residue modulo n.

    Alice sends b to Bob.

    Bob accepts Alice’s claim if and only if bx2 (mod n).

  4. Conclude that this is not a good zero-knowledge protocol, by demonstrating that Bob can maliciously send a bad a to Alice so that during the execution of the protocol he gathers enough information to factor n. [H]

Chapter Summary

All the material studied in earlier chapters culminates in this relatively short chapter which describes some popular cryptographic algorithms. We address most of the problems relevant in cryptography, namely, encryption, key agreement, digital signatures and entity authentication. Against each algorithm we mention the (provable or alleged) source of security of the algorithm.

Encryption algorithms are treated first. We start with the seemingly most popular RSA algorithm. This algorithm derives its security from the RSA key inversion problem and the RSA problem. The key inversion problem is probabilistic polynomial-time equivalent to the integer factorization problem. The intractability of the RSA problem is unknown. At present no algorithm other than factoring the RSA modulus is known for solving the RSA problem. We subsequently describe Rabin encryption (based on the square root problem), Goldwasser–Micali encryption (based on the quadratic residuosity problem), Blum–Goldwasser encryption (based on the square root problem), ElGamal encryption (based on the Diffie–Hellman problem) and Chor–Rivest encryption (based on a variant of the subset sum problem). The XTR encryption algorithm is essentially an efficient implementation of ElGamal encryption and is based on a tricky representation of elements in certain finite fields. The last encryption algorithm we discuss is the NTRU algorithm. It derives its security from a mixing system that uses the algebra . Attacks on NTRU based on the shortest vector problem are also known.

The basic key-agreement scheme is the Diffie–Hellman scheme. In order to prevent small-subgroup attacks on this scheme, one employs a technique known as cofactor expansion. We then explain unknown key-share attacks against key-agreement schemes. These attacks necessitate the use of authenticated key agreement schemes. The MQV algorithm is presented as an example of an authenticated key-agreement scheme.

Next come digital signature algorithms. Digital signatures may be classified in two broad categories: signature schemes with appendix and signature schemes with message recovery. In this book, we study only the signature schemes with appendix. As specific examples of signature schemes, we first explain RSA and Rabin signatures. Then, we present several variants of discrete-log-based signature schemes: ElGamal signatures, Schnorr signatures, Nyberg–Rueppel signatures, the digital signature algorithm (DSA) and its elliptic curve variant ECDSA. All the discrete-log (over finite fields)-based signature schemes have efficient XTR implementations. The NTRUSign algorithm is the last general-purpose signature scheme discussed in this section.

We then present a treatment of some special signature schemes. Blind signatures are created on messages unknown to the signer. Three blind signature schemes are described: Chaum, Schnorr and Okamoto–Schnorr schemes. An undeniable signature, on the other hand, requires an active participation of the signer at the time of verification and comes with a denial protocol that prevents a signer from denying a valid signature at a later time. The Chaum–Van Antwerpen undeniable signature scheme is based on the discrete-log problem, whereas the GKR scheme is based on the RSA problem.

A way to guarantee both authentication and confidentiality of a message is to sign the message and then encrypt the signed message. This involves two basic operations (signature generation and encryption). Zheng’s signcryption scheme combines these two primitives with a view to reducing both running time and message expansion.

The final topic we discuss in this chapter is entity authentication, a mechanism by means of which an entity can prove its identity to another. Here identity of an entity is considered synonymous with the possession of some secret information by the entity. Passwords are called weak authentication schemes, since the claimant has to disclose the secret straightaway to the verifier. A strong authentication scheme (also called a challenge–response scheme) does not reveal the secret to the verifier. We describe two strong authentication schemes; the first is based on encryption and the second on digital signatures. A way to establish mutual authentication between two entities is also presented. Challenge–response algorithms may be vulnerable to some attacks mounted by the verifier. A zero-knowledge protocol comes with a proof that during the authentication conversation no information is leaked to the verifier. Three zero-knowledge protocols are discussed: the Feige–Fiat–Shamir protocol, the Guillou–Quisquater protocol, and the Schnorr protocol.

Suggestions for Further Reading

Public-key cryptography was born from the seminal works of Diffie and Hellman [78] and Rivest, Shamir and Adleman [252]. Though still young, this area has induced much research in the last three decades. In this chapter, we have made an attempt to summarize some important cryptographic algorithms proposed in the literature. The original papers where these techniques have been introduced are listed below. We don’t plan to be exhaustive, but mention only the most relevant resources.

AlgorithmReference(s)
RSA encryption[252]
Rabin encryption[246]
Goldwasser–Micali encryption[117]
Blum–Goldwasser encryption[27]
ElGamal encryption[84]
Chor–Rivest encryption[54]
XTR encryption[170, 172, 171, 173, 289, 297]
NTRU encryption[130]
Identity-based encryption[267, 34, 35]
Diffie–Hellman key exchange[78]
Menezes–Qu–Vanstone key exchange[161]
RSA signature[252]
Rabin signature[246]
ElGamal signature[84]
Schnorr signature[263]
Nyberg–Rueppel signature[223, 224]
DSA[220]
ECDSA[141]
XTR signature[170, 172, 171, 173, 289, 297]
NTRUSign[110, 111, 128, 129, 131, 217]
Chaum blind signature[48, 49, 50]
Schnorr blind signature[263, 202]
Okamoto–Schnorr blind signature[227, 236]
Chaum–Van Antwerpen undeniable signature[51, 52, 53]
RSA undeniable signature[109, 187, 102, 186]
Signcryption[310, 311, 312]
Signcryption based on elliptic curves[313, 314]
Identity-based signcryption[178, 185]
Feige–Fiat–Shamir ZK protocol[90, 91]
Guillou–Quisquater ZK protocol[122]
Schnorr ZK protocol[263]

The Handbook of Applied Cryptography [194] is a single resource where most of the above algorithms have been discussed in good details. See Chapter 8 of this book for encryption algorithms, Chapter 11 for digital signatures and Chapter 10 for identification schemes.

There are several other (allegedly) intractable mathematical problems based on which cryptographic protocols can be built. Some of the promising candidates that we left out in the text are summarized below:

AlgorithmIntractable problem
LUC [284, 285, 286]RSA and ElGamal-like problems based on Lucas sequences
Goldreich–Goldwasser–Halevi [115]lattice-basis reduction
Patarin’s hidden field equationsolving multivariate polynomial
(HFE) [232]equations
EPOC/ESIGN [97, 228]factorization of integers p2q
McEliece encryption [190]decoding of error-correcting codes
Number field cryptography [38, 39]discrete log problem in class groups of quadratic fields
KLCHKP (Braid group cryptosystem) [148]Braid conjugacy problem

The Internet site http://www.tcs.hut.fi/~helger/crypto/link/public/index.html is a good place to start, for more information on these (and some other) cryptosystems. Also visit http://www.kisa.or.kr/technology/sub1/index-PKC.htm.

The obvious question that crops up now is that, given so many different cryptographic schemes, which one a user should go for.[5] There is no clear-cut answer to this question. One has to study the relative merits and demerits of the systems. If computational efficiency is what matters, we advocate users to go for NTRU schemes. Having said that, we must also add that the NTRU scheme is relatively new and has not yet withstood sufficient cryptanalytic attacks. Various attacks on NSS and NTRUSign cast doubt about the practical safety of applying such young schemes in serious applications.

[5] It is worthwhile to issue a warning to the readers. Many cryptographic algorithms (and also the idea of public-key cryptography) are/were patented. In order to implement these algorithms (in particular, for commercial purposes), one should take care of the relevant legal issues. We summarize here some of the important patents in this area. The list is far from exhaustive.

Patent No.

Covers

Patent holder

Date of issue

US 4,200,770

Diffie–Hellman key exchange (includes ElGamal encryption)

Stanford University

Apr 29, 1980

US 4,218,582

Public-key cryptography

Stanford University

Aug 19, 1980

US 4,405,829

RSA

MIT

Sep 20, 1983

US 5,231,668

DSA

USA, Secretary of Commerce

Jul 27, 1993

US 5,351,298

LUC

P. J. Smith

Sep 27, 1994

US 5,790,675

HFE

CP8 Transac (France)

Aug 4, 1998

EP 0963635A1 / WO 09836526

XTR

Citibank (North America)

Dec 15, 1999

Aug 20, 1998

US 6,081,597

NTRU

NTRU Cryptosystems, Inc.

Jun 27, 2000

EPOC/ESIGN

Nippon Telegraph and Telephone Corporation

Apr 17, 2001


Our mathematical trapdoors are not provably secure and this is where the problems begin. We have to rely on historical evidences that should not be collected too hastily. Slow as it is, RSA has stood the test of time, and has successfully survived more than twenty years of cryptanalytic attacks [29]. The risks attached to the fact that an unforeseen attack will break the system tomorrow, appear much less with RSA, compared to newer schemes that have enjoyed only little cryptanalytic studies. The hidden monomial system proposed by Imai and Matsumoto [188] was broken by Patarin [231]. As a by-product, Patarin came up with the idea of cryptosystems based on hidden field equations (HFE) [232]. No serious attacks on HFE are known till date, but as we mentioned earlier, only time will show whether HFE is going to survive.

Bruce Schneier asserts in his Crypto-gram news-letter (15 March 1999, http://www.counterpane.com/crypto-gram.html): No one can duplicate the confidence that RSA offers after 20 years of cryptanalytic review. A standard security review, even by competent cryptographers, can only prove insecurity; it can never prove security. By following the pack you can leverage the cryptanalytic expertise of the worldwide community, not just a handful of hours of a consultant’s time.

Twenty-odd years is definitely not a wide span of time in the history of evolution of our knowledge, but public-key cryptography is only as old as RSA is!

6. Standards

6.1Introduction
6.2IEEE Standards
6.3RSA Standards
 Chapter Summary
 Sugestions for Further Reading

In theory, there is no difference between theory and practice. But, in practice, there is.

—Jan L. A. van de Snepscheut

ECC curves are divided into three groups, weak curves, inefficient curves, and curves patented by Certicom.

—Peter Gutmann

Acceptance of prevailing standards often means we have no standards of our own.

—Jean Toomer (1894 – 1967)

6.1. Introduction

Public-key cryptographic protocols deal with sets like the ring of integers modulo n, the multiplicative group of units in a finite field or the group of points in an elliptic curve over a finite field. Messages that need to be encrypted or signed are, on the other hand, usually human-readable text or numbers or keys of secret-key cryptographic protocols, which are typically represented in computers in the form of sequences of bits (or bytes). It is necessary to convert such bit stings (or byte strings) to mathematical elements before the cryptographic algorithms are applied. This conversion is referred to as encoding. The reverse transition, that is, converting mathematical entities back to bit strings is called decoding.

If Alice and Bob were the only two parties involved in deploying public-key protocols, they could have agreed upon a set of private (not necessarily secret) encoding and decoding rules. In practice, however, when many entities interact over a public network, it is impractical, if not impossible, to have an individual encoding scheme for every pair of communicating parties. This is also unnecessary, because the security of the protocols comes from the encryption process and not from encoding. On the contrary, poorly designed encoding schemes may endanger the security of the underlying protocols.

We, therefore, need a set of standard ways of converting data between various logical formats. This promotes interoperability, removes ambiguities, facilitates simplicity in handling cryptographic data and thereby enhances the applicability and acceptability of public-key algorithms. IEEE (The Institute of Electrical and Electronics Engineers, Inc., pronounced eye-triple-e) and the RSA laboratories have published extensive documents standardizing data conversion and encoding for many popular public-key cryptosystems. Here we summarize the contents of some of these documents. This exposition is meant mostly for software engineers intending to develop cryptographic tool-kits that conform to the accepted standards.

6.2. IEEE Standards

In this section, we outline the first three of the drafts from IEEE, shown in Table 6.1. At the time of writing this book, these are the latest versions of the drafts available from IEEE. In future, these may be superseded by newer documents. We urge the reader to visit the web-site http://grouper.ieee.org/groups/1363/ for more up-to-date information. Also see the standard IEEE 1363–2000: Standards Specifications for Public-key Cryptography [134].

Table 6.1. IEEE drafts on public-key cryptography
DraftDateDescription
P1363 / D1312 November 1999Traditional public-key cryptography based on IFP, DLP and ECDLP
P1363a/D1216 July 2003Additional techniques on traditional public-key cryptography
P1363.1/D47 March 2002Lattice-based cryptography
P1363.2/D1525 May 2004Password-based authentication
P1363.3/D1May 2008Identity-based public-key cryptography

6.2.1. The Data Types

Public-key protocols operate on data of various types. The IEEE drafts specify only the logical descriptions of these data types. The realizations of these data types should be taken care of by individual implementations and are left unspecified.

Bit strings

A bit string is a finite ordered sequence a0a1 . . . al–1 of bits, where each bit ai can assume the value 0 or 1. The length of the bit string a0a1 . . . al–1 is l. The bit a0 in the bit string a0a1 . . . al–1 is called the leftmost or the first or the leading or the most significant bit, whereas the bit al–1 is called the rightmost or the last or the trailing or the least significant bit.

The order of appearance of the bits in a bit string is important, rather than the way the bits are indexed or named. That is to say, the most and least significant bits in a given bit string are uniquely determined by their positions of occurrences in the string, and not by the way the individual bits in the string are numbered. Thus, for example, if we call the bit string 01101 as a0a1a2a3a4, then the leading and trailing bits are a0 and a4 respectively. If we index the bits in the same bit string as a2a3a5a7a11, the first bit is a2 and the last bit is a11. Finally, for the indexing a5a4a3a2a1, the leftmost and rightmost bits are a5 and a1 respectively.

Octet strings

Though bits are the basic building blocks in computer memory, programs typically access memory in groups of 8 bits, known as octets. Thus, an octet is a bit string of length 8 and can have one of the 256 values 0000 0000 through 1111 1111. It is convenient to write an octet as a concatenation of two hexadecimal digits, the first (resp. second) one corresponding to the first (resp. last) 4 bits in the octet being treated as an 8-bit integer in base 2. For example, the octet 0010 1011 is represented by 2b. It is also often customary to treat an octet a0a1 . . . a7 as the integer (between 0 and 255, both inclusive) whose binary representation is a0a1 . . . a7.

An octet string is a finite ordered sequence of octets. The length of an octet string is the number of octets in the string. The leftmost (or first or leading or most significant) and the rightmost (or last or trailing or least significant) octets in an octet string are defined analogously as in the case of bit strings. These octets are dependent solely on their positions in the octet string and are independent of how the individual octets in the octet string are numbered.

Integers

Integers are the whole numbers 0, ±1, ±2, . . . . For cryptographic applications, one typically considers only non-negative integers. Integers used in cryptography may have binary representations requiring as many as several thousand bits.

Prime finite fields

Let p be a prime (typically, odd). The elements of are represented as integers 0, 1, . . . , p – 1 under the standard way of associating the integer with the congruence class [a]p in . Arithmetic operations in are the corresponding integer operations modulo the prime p.

Finite fields of characteristic 2

The elements of the field are represented as bit strings of length m. In order to provide the mathematical interpretation of these bit strings, we recall that is an m-dimensional -vector space. Let β0, . . . , βm–1 be an ordered basis of over . The bit string a0 . . . am–1 is to be identified with the element a0β0 + · · · + am–1βm–1, where the bit ai represents the element [ai]2 of . Selection of the basis β0, . . . , βm–1 renders a complete meaning to this representation and determines how arithmetic operations on these elements are to be performed. The following two cases are recommended.

For the polynomial-basis representation, one chooses an irreducible polynomial of degree m and represents as . Letting x denote the canonical image of X in one chooses the ordered basis β0 = xm–1, β1 = xm–2, . . . , βm–1 = 1. Arithmetic operations in under this representation are those of followed by reduction modulo the defining polynomial f(X). Choice of the irreducible polynomial f(X) is left unspecified in the IEEE drafts.

For the normal-basis representation, one selects an element which is normal over (see Definition 2.60, p 86), and takes the ordered basis β0 = θ = θ20, β1 = θ21, β2 = θ22, . . . , βm–1 = θ2m–1. Arithmetic in is carried out as explained in Section 2.9.3.

The IEEE draft P1363a also specifies a composite-basis representation of elements of , provided that m is composite. Let m = ds with 1 < d < m. One chooses an (ordered) polynomial or normal basis γ0, γ1, . . . , γs–1 of over . An element of is of the form a0γ0 + a1γ1 + · · · + as–1γs–1 and is represented by a0a1 . . . as–1, where each ai, being an element of , is represented by a bit string of length d. The interpretation of the representation of ai is dependent on how is represented. One can use a polynomial- or normal-basis representation of (over ), or even a composite-basis representation of over , if d happens to be composite with a non-trivial divisor d′.

Extension fields of odd characteristics

A non-prime finite field of odd characteristic is one with cardinality pm for some odd prime p and for some , m > 1. The field is represented as , where is an irreducible polynomial of degree m. An element of is then of the form α = am–1xm–1 + · · · + a1x + a0, where x := X + 〈f(X)〉 and where each ai is an element of , that is, an integer in the range 0, 1, . . . , p – 1. The element α is represented as an integer by substituting p for x, that is, as the integer (see the packed representation of Exercise 3.39). In order to interpret an integer between 0 and pm – 1 as an element of , one has to expand the integer in base p.

* Elliptic curves

An elliptic curve defined over a finite field is specified by two elements a, . Depending on the characteristic of this pair defines the following curves.

If char , 3, then 4a3 + 27b2 must be non-zero in and the equation of the elliptic curve is taken to be Y2 = X3 + aX + b.

For char , we must have b ≠ 0 in and we use the non-supersingular curve Y2 + XY = X3 + aX2 + b. Because of the MOV attack (Section 4.5.1), supersingular curves are not recommended for cryptographic applications.

Finally, if has characteristic 3, then both a and b must be non-zero in and the elliptic curve Y2 = X3 + aX2 + b is specified by (a, b).

* Elliptic curve points

A point on an elliptic curve defined over can be represented either in compressed or in uncompressed form. In the uncompressed form, one represents P as the pair (h, k) of elements of . The compressed form can be either lossy or lossless. In the lossy compressed form, P is represented by its X-coordinate h only. Such a representation is not unique in the sense that there can be two points on the elliptic curve with the same X-coordinate h. In applications where Y -coordinates of elliptic curve points are not utilized, such a representation can be used. In the lossless compressed form, one represents P as . There are two solutions (perhaps repeated) for Y for a given value h of X. The bit specifies which of these two values is represented. Depending on how the bit is computed, we have two different lossless compressed forms.

The LSB compressed form is applicable for odd prime fields or fields of even characteristic. For , the bit is taken to be the least significant (that is, rightmost) bit of k (treated as an integer). For , we have , if h = 0, whereas if h ≠ 0, then is the least significant bit of the element kh–1 treated as an integer via the FE2I conversion primitive described in Section 6.2.2.

The SORT compressed form is used for q = pm, m > 1. Let P′ = (h, k′) be the opposite of P = (h, k), that is, One converts k and k′ to integers and using the FE2I primitive and sets .

One may also go for a hybrid representation of the elliptic curve point P = (h, k), in which information for both the compressed and the uncompressed representations for P are stored, that is, P is stored as with computed by one of the methods (LSB or SORT) described above.

* Convolution polynomial rings

For NTRU public-key cryptosystems, we work in the ring . We denote as usual. An element of R is a polynomial a(x) = a0 + a1x + a2x2 + · · · + an–1xn–1 with , and is represented by the ordered n-tuple of integers (a0, a1, . . . , an–1). Addition (resp. subtraction) in R is simply component-wise addition (resp. subtraction), whereas multiplication of a(x) = a0 + a1x + · · · + an–1xn–1 and b(x) = b0 + b1x + · · · + bn–1xn–1 gives c(x) = c0 + c1x + · · · + cn–1xn–1, where ajbk (see Section 5.2.8). The IEEE draft P1363.1 designates elements of R as ring elements.

It is customary to deal with polynomials in R with small coefficients. If all the coefficients of are known to be from {0, 1}, it is convenient to represent a(x) as the bit string a0a1 . . . an–1 instead of as an n-tuple of integers. In this case, a(x) is called a binary ring element or simply a binary element.

6.2.2. Conversion Among Data Types

The IEEE drafts P1363 and P1363.1 specify algorithms for converting data among the formats discussed above. The standardized data conversion primitives are summarized in Figure 6.1. Though these drafts support elliptic curve cryptography, it is not specified how data representing elliptic curves can be converted to data of other types (like octet strings and bit strings).

Figure 6.1. IEEE P1363 data types and conversions


We now provide a brief description of the data conversion primitives at a logical level. The implementation details depend on the representations of the data types and are left out here.

Converting bit strings to octet strings (BS2OS)

A bit string a0a1 . . . al–1 can be broken up in groups of eight bits and packed into octets. But we run with difficulty, if the length of the input bit string is not an integral multiple of 8. We have to add extra bits in order the make the length of the augmented bit string an integral multiple of 8. This can be done is several ways and in this context a standard convention needs to be adopted. The IEEE drafts prescribe the following rules:

  1. Every extra bit added must be the zero bit.

  2. Add the minimal number of extra bits.

  3. Add the extra bits, if any, to the left.[1]

    [1] At the time of writing this book there is a serious conflict between the latest drafts of P1363 and P1363.1 from IEEE. The former asks to add extra bits to the left, the latter to the right. One of the authors of this book raised this issue in the discussion group stds-p1363-discuss maintained by IEEE and was notified that in the next version of the P1363.1 document this conflict would be resolved in favour of P1363.

In order to see what these rules mean, let a0a1 . . . al–1 be a bit string of length l to be converted to the octet string A0A1 . . . Ad–1. The length of the output octet string must be d = ⌈l/8⌉. 8dl zero bits should be added to the left of the input bit string in order to create the augmented bit string 0 . . . 0a0a1 . . . al–1 whose length is 8d. Now, we start from the left and pack blocks of consecutive eight bits in A0, A1, . . . , Ad–1. Thus, we have A0 = 0 . . . 0a0 . . . ak–1, A1 = ak . . . ak+7, . . . , Ad–1 = ak+8(d–2) . . . ak+8(d–2)+7, where k = 8 – (8dl). Note that if l is already a multiple of 8, then 8dl = 0, that is, no extra bits need to be added.

As an example, consider the input bit string 01110 01101011 of length 13. The output octet string should be of length ⌈13/8⌉ = 2. Padding gives the augmented bit string 00001110 01101011. The first octet in the output octet string will then be 00001110, that is, 0e; and the second octet will be 01101011, that is, 6b.

Converting octet strings to bit strings (OS2BS)

The OS2BS primitive is designed to ensure that if we convert an octet string generated by BS2OS, we should get back the original bit string (that is, the input to BS2OS) with which we started. Suppose that we want to convert an octet string A0A1 . . . Ad–1. Let us write the bits of Ai as ai,0ai,1 . . . ai,7. The desired length l of the output bit string has to be also specified. If d ≠ ⌈l/8⌉, the procedure OS2BS reports error and stops. If d = ⌈l/8⌉, we consider the bit string

a0,0a0,1 . . . a0,7a1,0a1,1 . . . a1,7 . . . ad–1,0ad–1,1 . . . ad–1,7

of length 8d. If the leftmost 8dl bits of this flattened bit string are not all zero, OS2BS should quit after reporting error. Otherwise, the trailing l bits of the flattened bit string is returned.

The reader can check that when 0e 6b and l = 13 are input to OS2BS, it returns the bit string 01110 01101011. (See the example in connection with BS2OS.) Notice also that for this input octet string, OS2BS reports error if and only if a value l ≥ 17 or l ≤ 11 is supplied as the desired length of the output bit string.

Converting integers to bit strings (I2BS)

Let a non-negative integer n be given. The I2BS primitive outputs a bit string of length l representing n. If n ≥ 2l, this conversion cannot be done and the primitive reports error and quits. If n < 2l, we write the binary representation of n as

n = al–12l–1 + al–22l–2 + · · · + a12 + a0 with .

Treating each ai as a bit[2], I2BS returns the bit string al–1al–2 . . . a1a0. One or more leading bits of the binary representation of n may be zero. There is no limit on how many leading zero bits are allowed during the conversion. In particular, the integer 0 gets converted to a sequence of l zero bits for any value of l supplied.

[2] Each ai is logically an integer which happens to assume one of two possible values: 0 and 1. A bit, on the other hand, is a quantity that can also assume only two possible values. Traditionally, the values of a bit are also denoted by 0 and 1. But one has the liberty to call these values off and on, or false and true, or black and white, or even armadillo and platypus. To many people, bit is an abbreviation for binary digit which our ais logically are. To others, binit is a safer and more individualistic acronym for binary digit. For I2BS, we identify the two concepts.

A request to I2BS to convert n = 2357 = 211 + 28 + 25 + 24 + 22 + 20 with l = 12 returns 1001 00110101, one with l = 18 returns 00 00001001 00110101 and finally one with l ≤ 11 reports failure. Note that for neater look we write bit strings in groups of eight and grouping starts from the right. This convention reflects the relationship between bit strings and octet strings, as mentioned above.

Converting bit strings to integers (BS2I)

The primitive BS2I converts the bit string a0a1 . . . al–1 to the integer a02l–1 + a12l–2 + · · · + al–22 + al–1, where we again identify a bit with an integer (or a binary digit). As an illustrative example, the bit string 1001 00110101 (or 00 00001001 00110101) gets converted to the integer 211 + 28 + 25 + 24 + 22 + 20 = 2357. The null bit string (that is, the one of zero length) is converted to the integer 0.

Converting integers to octet strings (I2OS)

In order to convert a non-negative integer n to an octet string of length d, we write the base-256 expansion of n as

n = Ad–1256d–1 + Ad–2256d–2 + · · · + A1256 + A0,

where each and can be naturally identified with an octet. I2OS returns the octet string Ad–1Ad–2 . . . A1A0. Note that the above representation of n to the base 256 is possible if and only if n < 256d. If n ≥ 256d, I2OS should return failure. Like bit strings, an arbitrary number of leading zero octets are allowed.

Consider the integer 2357 = 9 × 256 + 53. The two-digit hexadecimal representations of 9 and 53 are 09 and 35 respectively. Thus, a call of I2OS on this n with d = 3 (resp. d = 2, resp. d = 1) returns 00 09 35 (resp. 09 35, resp. failure).

Converting octet strings to integers (OS2I)

Let an octet string A0A1 . . . Ad–1 be given. Each Ai can be identified with a 256-ary digit. OS2I returns the integer A0256d–1 + A1256d–2 + · · · + Ad–2256 + Ad–1. If d = 0, the integer 0 should be output.

Converting field elements to octet strings (FE2OS)

In the IEEE P1363 jargon, a field element is an element of the finite field , where q is a prime or an integral power of a prime. We want to convert an element to an octet string. Depending on the value of q, we have two cases:

If char is odd, β is represented as an integer in {0, 1, . . . , q – 1}. FE2OS converts β to an octet string of length ⌈log256 q⌉ by calling the primitive I2OS.

If q = 2m, β is represented as a bit string of length m. The primitive BS2OS is called to convert β to an octet string.

Converting octet strings to field elements (OS2FE)

Assume that an octet string is to be converted to an element of the finite field . Again we have two possibilities depending on q.

If is of odd characteristic, the primitive OS2I is called to convert the given octet string to an integer. This integer is returned as the field element.

If q = 2m, one calls the primitive OS2BS with the given octet string and with the length m supplied as inputs. The resulting bit string is returned by OS2FE. If OS2BS reports error, so should do OS2FE too.

Converting field elements to integers (FE2I)

Let and the integer equivalent of β be sought for. If q is odd, then β is already represented as an integer (in {0, 1, . . . , q – 1}) and is itself output. If q = 2m, one first converts β to an octet string by FE2OS and subsequently converts this octet string to an integer by calling the primitive OS2I.

* Converting elliptic curve points to octet strings (EC2OS)

The point at infinity (on an elliptic curve over ) is defined by an octet string comprising a single zero octet only. So let P = (h, k) be a finite point. The EC2OS primitive produces an octet string PO = P CHK which is the concatenation of a single octet PC with octet strings H and K representing h and k respectively. The values of PC and K depend on the type of compression used. One has , where

S = 1 if and only if the SORT compression is used.

U = 1 if and only if uncompressed or hybrid form is used.

C = 1 if and only if compressed or hybrid form is used.

= if compression is used, it is 0 otherwise.

The first four bits of PC are reserved for (possible) future use and should be set to 0000 for this version of the standard. H is the octet string of length ⌈log256 q⌉ obtained by converting h using FE2OS. If the compressed form is used, K is the empty octet string, whereas if uncompressed or hybrid form is used, we have K = FE2OS(k, ⌈log256 q⌉). Finally, for the lossy compression we have PC = 0000 0001, H = FE2OS(h, ⌈log256 q⌉) and K is empty. Table 6.2 summarizes all these possibilities. Here, l := ⌈log256 q⌉, and p is an odd prime.

Table 6.2. The EC2OS primitive
RepresentationPCHKq
uncompressed0000 0100FE2OS(h, l)FE2OS(k, l)All
LSB compressedFE2OS(h, l)Emptyp, 2m
LSB hybridFE2OS(h, l)FE2OS(k, l)p, 2m
SORT compressedFE2OS(h, l)Empty2m, pm
SORT hybridFE2OS(h, l)FE2OS(k, l)2m, pm
lossy compression0000 0001FE2OS(h, l)EmptyAll
point at infinity 0000 0000EmptyEmptyAll

* Converting octet strings to elliptic curve points (OS2EC)

The OS2EC data conversion primitive takes as input an octet string PO, the length l = ⌈log256 q⌉ and the method of compression. If PO contains only one octet and that octet is zero, is output. Otherwise, the elliptic curve point P = (h, k) is computed as follows. OS2EC decomposes PO = PCHK, with PC the first octet and with H an octet string of length l. If PC does not match with the method of compression, OS2EC returns error. Otherwise, it uses OS2FE to compute the field element h. If no or hybrid compression is used, the Y -coordinate k is also computed using OS2FE on K. If (h, k) is not a point on the elliptic curve, error is reported. For the LSB or SORT compression, the Y -coordinate is computed using h and . If the hybrid scheme is used and , OS2EC halts after reporting error. If all computations are successful till now, the point (h, k) is output.

Note that the checks for (h, k) being on the curve or for the equality are optional and may be omitted. For the lossy compression scheme, the Y -coordinate k is not necessarily uniquely determined from the input octet string PO. In that case, any of the two possibilities is output.

* Converting ring elements to octet strings (RE2OS)

Ring elements are elements of the convolution polynomial ring and can be identified as polynomials with integer coefficients and of degrees < n. The element (where ) is represented by the n-tuple of integers (a0, a1, . . . , an–1). The IEEE draft P1363.1 assumes that the coefficients ai are available modulo a positive integer β ≤ 256. But then each ai is an integer in {0, 1, . . . , β – 1} and can be naturally encoded by a single octet. RE2OS, upon receiving a(x) as input, outputs the octet string a0a1 . . . an–1 of length n.

An example: Let n = 7 and β = 128. The ring element a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0) is converted to the octet string 02 0b 00 65 7f 47 00.

* Converting octet strings to ring elements (OS2RE)

Let an octet string a0a1 . . . an–1 of length n be given, which we want to convert to an element of . Once again a modulus β ≤ 256 is assumed, so that each octet ai can be viewed as an integer reduced modulo β. Making the natural identification of ai with an integer, the polynomial is output. Thus, for example, the octet string 02 0b 00 65 7f 47 00 gets converted to the ring element 2 + 11x + 101x3 + 127x4 + 71x5.

* Converting ring elements to bit strings (RE2BS)

The RE2BS primitive assumes that the modulus β is a power of 2, that is, β = 2t for some positive integer t ≤ 8. Let a ring element be given, where each . One applies the I2BS primitive on each ai to generate the bit string ai,0ai,1 . . . ai,t–1 of length t. The concatenated bit string

a0,0a0,1 . . . a0,t–1 a1,0a1,1 . . . a1,t–1 . . . an–1,0an–1,1 . . . an–1,t–1

of length nt is then returned by RE2BS.

As before, take the example of n = 7, β = 128 = 27 (so that t = 7) and a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0). The coefficients 2, 11, 0, . . . should first be converted to bit strings of length 7 each, that is, 2 gives 0000010, 11 gives 0001011 and so on. Thus, the bit string output by RE2BS will be 0000010 0001011 0000000 1100101 1111111 1000111 0000000. Note that here we have shown the bits in groups of 7 in order to highlight the intermediate steps (the outputs from I2BS). With the otherwise standard grouping in blocks of 8, the output bit string looks like 0 00001000 01011000 00001100 10111111 11100011 10000000 and hence transforms to the octet string 00 08 58 0c bf d3 80 by an invocation of BS2OS. This example illustrates that RE2BS followed by BS2OS does not necessarily give the same output as the direct conversion RE2OS, even when every underlying parameter (like β) remains unchanged.

* Converting bit strings to ring elements (BS2RE)

Once again we require the modulus β to be a power 2t of 2. Let a bit string a0a1 . . . al–1 of length l be given, and we want to compute the ring element a(x) equivalent to this. If l is not an integral multiple of t, the algorithm should quit after reporting error. Otherwise we let l = nt for some , and repeatedly call the BS2I primitive on the bit strings a0a1 . . . at–1, atat+1 . . . a2t–1, . . . , anttantt+1 . . . ant–1 to get the integers α0, α1, . . . , αn–1 respectively. The polynomial a(x) = α0 + α1x + · · · + αn–1xn–1 is then output.

We urge the reader to verify that BS2RE with β = 128 and the bit string

0000010 0001011 0000000 1100101 1111111 1000111 0000000

as input produces the ring element .

* Converting binary elements to octet strings (BE2OS)

A binary (ring) element is an element with each . One can convert a(x) to an octet string A0A1 . . . Al–1 of any desired length l as follows. We denote the bits in the octet Ai as Ai,7Ai,6 . . . Ai,0. Here, the index of the bits increases from right to left.

First we rewrite the polynomial a(x) as one of degree 8l – 1, that is, as a(x) = a0 + a1x + · · · + a8l–1x8l–1. If n ≤ 8l, this can be done by setting an = an+1 = · · · = a8l–1 = 0. On the other hand, if n > 8l and one or more of the coefficients a8l, a8l+1, . . . , an–1 are non-zero (that is, 1), the above rewriting of a(x) cannot be done and BE2OS terminates after reporting failure.

When the above rewriting of a(x) becomes successful, one sets the bits of the output octets as A0,0 := a0, A0,1 := a1, . . . , A0,7 := a7, A1,0 := a8, A1,1 := a9, . . . , A1,7 := a15, A2,0 := a16, A2,1 := a17, . . . , A2,7 := a23, . . . , Al–1,0 := a8l–8, Al–1,1 := a8l–7, . . . , Al–1,7 := a8l–1.

As an example, take n = 20 and consider the binary element . First let l = 1. Rewriting a(x) as a polynomial of degree 7 is not possible, since the coefficients of x10 and x12 are 1; so BE2OS outputs error in this case. If l = 2, then the output octet string will be 00000111 00010100, that is, 07 14. For l ≥ 3, the first two octets will be 07 and 14 as before, whereas the 3rd through l-th octet will be 00.

The BE2OS primitive can be quite effective for reducing storage requirements. For example, the polynomial a(x) of degree 12 of the previous paragraph, viewed as an element of , can be encoded in just two octets. Of course, by specifying l ≥ 3 one may add l – 2 trailing zero octets, if one desires. On the other hand, RE2OS requires exactly 200 octets, whereas RE2BS with β = 128 followed by BS2OS requires exactly ⌈(200 × 7)/8⌉ = 175 octets for storing the same a(x).

* Converting octet strings to binary elements (OS2BE)

Assume that an octet string A0A1 . . . Al–1 of length l is given and the equivalent binary element in is to be determined. As in the case with BE2OS, we index the bits in the octet Ai as Ai = Ai,7Ai,6 . . . Ai,0. Now, consider the polynomial a(x) = a0 + a1x + a2x2 + · · · + a8l–1x8l–1, where a8i+j = Ai,j. If n ≥ 8l, we set a8l = a8l+1 = · · · = an–1 = 0 and output the binary element . On the other hand, if n < 8l and an = an+1 = · · · = a8l–1 = 0, then equals the polynomial a(x) and is returned. Finally, if n < 8l and if any of the coefficients an, an+1, . . . , a8l–1 is non-zero, then OS2BE returns error.[3]

[3] In this case, it still makes full algebraic sense to treat a(x) as an element of R, though not in the canonical representation.

For example, assume that the octet string 07 14 is given as input to OS2BE. If n ≤ 12, the algorithm outputs error, because the polynomial a(x) in this case has degree 12. For any n ≥ 13, the binary element is returned.

6.3. RSA Standards

The public-key cryptography standards (PKCS) [254] refer to a set of standard specifications proposed by the RSA Laboratories. A one-line description of each of these documents is given in Table 6.3. In the rest of this section, we concentrate only on the documents PKCS #1 and #3.

Table 6.3. Public-key cryptography standards from the RSA Laboratories
DocumentDescription
PKCS #1RSA encryption and signature
PKCS #2Merged with PKCS #1
PKCS #3Diffie–Hellman key exchange
PKCS #4Merged with PKCS #1
PKCS #5Password-based cryptography
PKCS #6Extension of X.509 public-key certificates
PKCS #7Syntax of cryptographic messages
PKCS #8Syntax and encryption of private keys
PKCS #9Attribute types for use in PKCS #6, #7, #8 and #10
PKCS #10Syntax for certification requests
PKCS #11Cryptoki, an application programming interface (API)
PKCS #12Syntax of transferring personal information (private keys, certificates and so on)
PKCS #13Elliptic curve cryptography (under preparation)
PKCS #15Syntax for cryptographic token (like integrated circuit card) information

6.3.1. PKCS #1

PKCS #1 describes RSA encryption and RSA signatures. In this section, we summarize Version 2.1 (dated 14 June 2002) of the standard. This version specifies cryptographically stronger encoding procedures compared to the older versions. More specifically, the optimal asymmetric encryption procedure (OAEP [18]) for RSA encryption is incorporated in the Version 2.0 of PKCS #1, whereas the new probabilistic signature scheme (PSS [19]) is introduced in Version 2.1. This latest draft also includes encryption and signature schemes compatible with older versions (1.5 and 2.0). However, adoption of the new algorithms is strongly recommended for enhanced security.

RSA keys

PKCS #1 Version 2.1 introduces the concept of multi-prime RSA, in which the RSA modulus n may have more than two prime divisors. For RSA encryption and decryption to work properly, we only need n to be square-free (Exercise 4.1). Using u > 2 prime divisors of n increases efficiency and does not degrade the security of the resulting system much, as long as u is not very large. More specifically, if T is the time for RSA private-key operation without CRT, then the cost of this operation with CRT is approximately T/u2 (neglecting the cost of CRT combination).

So an RSA modulus is of the form n = r1r2 . . . ru with u ≥ 2 and with pairwise distinct primes r1, . . . , ru. For the sake of conformity with the older versions of the standard, the first two primes are given the alternate special names p := r1 and q := r2. PKCS #1 does not mention any specific way of choosing the prime divisors ri of n, but encourages use of primes that make factorization of n difficult.

An RSA public exponent is an integer e, 3 ≤ en – 1, with gcd(e, λ(n)) = 1, where λ(n) := lcm(r1 – 1, r2 – 1, . . . , ru – 1). An RSA public key is a pair (n, e) with n and e chosen as above.

The RSA private key corresponding to (n, e) can be stored in one of the two formats. In the first format, one maintains the pair (n, d) with the private exponent d so chosen as to make ed ≡ 1 (mod λ(n)). In the second format, one stores the five quantities (p, q, dP, dQ, qInv) and, if u > 2, the triples (ri, di, ti) for each i = 3, . . . , u. The meanings of these quantities are as follows:

p=r1
q=r2
dPe–1 (mod p – 1)
dQe–1 (mod q – 1)
qInvq–1 (mod p)
die–1 (mod ri – 1)
ti(r1 . . . ri–1)–1 (mod ri)

For the sake of consistency, one should store the CRT coefficient (mod r2), that is, p–1 (mod q). In order to ensure compatibility with older versions of PKCS, q–1 (mod p) is stored instead.

RSA key operations

The RSA public-key operation is used to encrypt a message or to verify a signature. The PKCS draft calls these primitives RSAEP (encryption primitive) and RSAVP1 (verification primitive). It is implemented in a straightforward manner as in Algorithm 6.1.

Algorithm 6.1. RSA encryption/signature verification primitive

Input: RSA public key (n, e) and message/signature representative x.

Output: The ciphertext/message representative y.

Steps:

if (x < 0) or (xn) { Return “Error: representative out of range”. }

y := xe (mod n).

The RSA decryption or signature-generation primitive is called RSADP or RSASP1 and is given in Algorithm 6.2. The operation depends on the format in which the private key K is stored. The correctness of the primitive is left to the reader as an easy exercise.

Algorithm 6.2. RSA decryption/signature generation primitive

Input: RSA private key K and the ciphertext/message representative y.

Output: The message/signature representative x.

Steps:

if (y < 0) or (y ≥ n) { Return “Error: representative out of range”. }
if (K is stored in the first format) {
   x := yd (mod n).
else {  /* K is stored in the second format */
   x1 := ydP (mod p).
   x2 := ydQ (mod q).
   h := (x1 – x2)qInv (mod p).
   x := x2 + qh.
   if (u > 2) {
      R := r1.
      for i = 3, . . . , u {
         xi := ydi (mod ri).
         R := R × ri–1.
         h := (xi – x)ti (mod ri).
         x := x + Rh.
      }
   }
}

RSAES–OAEP encryption scheme

The encryption scheme RSAES–OAEP is based on the optimal asymmetric encryption procedure (OAEP) proposed by Bellare and Rogaway [18, 98]. In this procedure, a string of length slightly less than the size of the modulus n is probabilistically encoded using a hash function and the encoded message is subsequently encrypted. The probabilistic encoding makes the encryption procedure semantically secure and (provably) provides resistance against chosen-ciphertext attacks. Under this scheme, an adversary can produce a ciphertext, only if she knows the corresponding plaintext. Such an encryption scheme is called plaintext-aware. Given an ideal hash function, Bellare and Rogaway’s OAEP is plaintext-aware.

RSAES–OAEP uses a label L which is hashed by a hash function H. One may take L as the empty string. Other possibilities are not specified in the PKCS draft. SHA-1 (or SHA-256 or SHA-384 or SHA-512) is the recommended hash function. The hash values (in hex) of the empty string under these hash functions are given in Table 6.4.

Table 6.4. Hash values of the empty string
FunctionHash of the empty string
SHA-1da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709
SHA-256e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855
SHA-38438b060a7 51ac9638 4cd9327e b1b1e36a 21fdb711 14be0743 4c0cc7bf 63f6e1da 274edebf e76f65fb d51ad2f1 4898b95b
SHA-512cf83e135 7eefb8bd f1542850 d66d8007 d620e405 0b5715dc 83f4a921 d36ce9ce 47d0d13c 5d85f2b0 ff8318d2 877eec2f 63b931bd 47417a81 a538327a f927da3e

The length of the hash output (in octets) is denoted by hLen. For SHA-1, hLen = 20. The RSA modulus n is assumed to be of octet length k. The octet length mLen of the input message M must be ≤ k–2hLen–2. RSAES–OAEP uses a mask-generation function designated as MGF (see Algorithm 6.11 for a recommended realization).

Algorithm 6.3 describes the RSA–OAEP encryption scheme which employs the EME–OAEP encoding scheme described in Algorithm 6.4. The use of a random seed makes the encryption probabilistic. We use the notation ‖ to denote string concatenation and ⊕ to denote bit-wise XOR.

Algorithm 6.3. RSA–OAEP encryption scheme

Input: The recipient’s public key (n, e), the message M (an octet string of length mLen) and an optional label L whose default value is the empty string.

Output: The ciphertext C of octet length k.

Steps:

/* Check lengths */

if (L is longer than what H can handle) { Return “Error: label too long”. }

/* For example, for SHA-1 the input must be of length ≤ 261 – 1 octets. */

if (mLen > k – 2hLen – 2) { Return “Error: message too long”. }

/* Encode M to EM (EME–OAEP encoding scheme) */

EM := EME-OAEP-encode(M, L)./* Algorithm 6.4 */
/* RSA encryption */ 
m := OS2I(EM)./* Convert octet string to integer */
c := RSAEP((n, e), m)./* RSA encryption primitive */
C := I2OS(c, k)./* Convert integer back to octet string */

The matching decryption operation is shown in Algorithm 6.5 which calls the EME–OAEP decoding procedure of Algorithm 6.6. The only error message that the decryption and decoding algorithms issue is decryption error. This is to ensure that an adversary cannot distinguish between different kinds of errors, because such an ability of the adversary may lead her to guess partial information about the decryption process and thereby mount a chosen-ciphertext attack.

Algorithm 6.4. RSA–OAEP encoding scheme

Input: The message M of octet length mLen, the label L.

Output: The EME–OAEP encoded message EM.

Steps:

lHash := H(L).

Generate the padding string PS with kmLen – 2hLen – 2 zero octets.

Generate the data block DB := lHashPS ‖ 01 ‖ M.

Let seed := a random string of length hLen octets.

Generate the data-block mask dbMask := MGF(seed, khLen – 1).

Generate the masked data-block maskedDB := DBdbMask.

Generate mask for seed seedMask := MGF(maskedDB, hLen).

Generate the masked seed maskedSeed := seedseedMask.

Generate the encoded message EM := 00 ‖ maskedSeedmaskedDB.

Algorithm 6.5. RSA–OAEP decryption scheme

Input: The recipient’s private key K, the ciphertext C to be decrypted and an optional label L (the default value of which is the null string).

Output: The decrypted message M.

Steps:

if (the length of L is more than the limitation of H) or (the length of C is not k octets)
        or (k < 2hLen + 2) { Return “Decryption error”. }

c := OS2I(C)./* Convert octet string to integer */
m := RSADP(K, c)./* RSA decryption primitive */
EM := I2OS(m, k)./* Convert integer back to octet string */
M := EME-OAEP-decode(EM, L)./* Algorithm 6.6 */

Algorithm 6.6. RSA–OAEP decoding scheme

Input: The encoded message EM and the label L.

Output: The EME–OAEP decoded message M.

Steps:

lHash := H(L).
Write EM = Y ‖ maskedSeed ‖ maskedDBwhere Y is a single octet,
       maskedSeed is a string of length hLen octets and
       maskedDB is a string of length k – hLen – 1 octets.
seedMask := MGF(maskedDBhLen).
seed := maskedSeed ⊕ seedMask.
dbMask := MGF(seedk – hLen – 1).
DB := maskedDB ⊕ dbMask.
Try to decompose DB = lHash′ ‖ PS ‖ 01 ‖ Mwhere lHash′ is of length hLen
       and PS is a (possibly empty) padding string comprising octets 00 only.
if (DB cannot be decomposed as above) or (lHash′ ≠ lHash) or
       (Y ≠ 00) { Return “Decryption error”. }

RSASSA–PSS signature scheme with appendix

RSASSA–PSS employs the probabilistic signature scheme proposed by Bellare and Rogaway [19]. Under suitable assumptions about the hash function and the mask-generation function, the RSASSA–PSS scheme produces secure signatures which are also tight in the sense that forging RSASSA–PSS signatures is computationally equivalent to inverting RSA.

Algorithm 6.7. RSASSA–PSS signature generation

Input: The message M (an octet string) to be signed, the private key K of the signer.

Output: The signature S (an octet string of length k).

Steps:

EM := EMSA–PSS–encode(M, modBits – 1)./* Encode by Algorithm 6.8 */
m := OS2I(EM)./* Convert octet string to integer */
s := RSASP1(m)./* RSA signature generation primitive */
S := I2OS(s, k)./* Convert integer back to octet string */

Algorithm 6.8. RSASSA–PSS encoding

Input: The message M to be encoded (an octet string), the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9.

Output: The encoded message EM, an octet string of length emLen := ⌈emBits/8⌉.

Steps:

if (M is longer than what H can handle) { Return “Error: message too long”. }

Generate the hashed message mHash := H(M).

if (emLen < hLen + sLen + 2) { Return “Encoding error”. }

Let salt := a random string of length sLen octets.

Generate the salted message M′ := 00 00 00 00 00 00 00 00 ‖ mHashsalt.

Generate the hashed salted message mHash′ := H(M′).

Generate the padding string PS with emLensLenhLen – 2 zero octets.

Generate the data block DB := PS ‖ 01 ‖ salt.

Generate the data block mask dbMask := MGF(mHash′, emLenhLen – 1).

Generate the masted data block maskedDB := DBdbMask.

Set to 0 the leftmost 8emLenemBits bits of the leftmost octet of maskedDB.

Compute EM := maskedDBmHash′ ‖ bc.

RSASSA–PSS signature generation (Algorithm 6.7) uses the EMSA–PSS encoding method (Algorithm 6.8). Verification (Algorithm 6.9) uses the EMSA–PSS decoding method (Algorithm 6.10). We assume that k is the octet length of the RSA modulus n. Let modBits denote the bit length of n. The encoded message is of length emLen = ⌈(modBits – 1)/8⌉ octets. The probabilistic behaviour of the encoding scheme is incorporated by the use of a random salt, the octet length of which is sLen. A hash function H that produces hash values of octet length hLen is employed.

Algorithm 6.9. RSASSA–PSS signature verification

Input: The message M, the signature S to be verified and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S)./* Convert octet string to integer */
m := RSAVP1((n, e), s)./* RSA signature verification primitive */
EM := I2OS(m, emLen)./* Convert integer back to octet string */
status := EMSA–PSS–decode(M, EM, modBits – 1)./* Algorithm 6.10 */

if (status is “consistent”) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.10. RSASSA–PSS decoding

Input: The message M (an octet string), the encoded message EM (an octet string of length emLen = ⌈emBits/8⌉) and the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9.

Output: Decoding status: “consistent” or “inconsistent”.

Steps:

if (M is longer than what H can handle) { Return “inconsistent”. }
Generate the hashed message mHash := H(M).
if (emLen < hLen + sLen + 2) { Return “inconsistent”. }
Try to decompose EM = maskedDB ‖ mHash′ ‖ Ywhere
       maskedDB is an octet string of length emLen – hLen – 1,
       mHash′ is an octet string of length hLenand Y is a single octet.
if (Y ≠ bc) or (the leftmost 8emLen – emBits bits of the leftmost octet of
       maskedDB are not all 0) { Return “inconsistent”. }
dbMask := MGF(mHash′, emLen – hLen – 1).
DB := maskedDB ⊕ dbMask.
Set to 0 the leftmost 8emLen – emBits bits of the leftmost octet of DB.
Try to decompose DB = PS ‖ 01 ‖ saltwhere PS is a string with
       emLen – sLen – hLen – 2 zero octets, and salt is of length sLen octets.
if (the above decomposition is unsuccessful) { Return “inconsistent”. }
Set M′ := 00 00 00 00 00 00 00 00 ‖ mHash ‖ salt.
if (H(M′) = mHash) { Return “consistent”. } else { Return “inconsistent”. }

A mask-generation function

A mask-generation function (MGF1) is specified in the PKCS #1 draft. It is based on a hash function H. The mask-generation function is deterministic in the sense that its output is completely determined by its input. However, the (provable) security of OAEP and PSS schemes are based on the pseudorandom nature of the output of the mask-generation function. This means that any part of the output should be statistically independent of the other parts. MGF1 derives this pseudorandomness from that of the underlying hash function H.

Algorithm 6.11. Mask-generation function MGF1

Input: The seed mg f Seed (an octet string) and the desired octet length maskLen of the output mask. One requires maskLen ≤ 232hLen, where hLen is the octet length of the hash function output.

Output: An octet string mask of length maskLen.

Steps:

if (maskLen > 232hLen) { Return “Error: mask too long”. }
Initialize T to the empty octet string.
for i = 0, 1, . . . , ⌈maskLen/hLen⌉ – 1 {
    I := I2OS(i, 4).
    T := T ‖ H(mgfSeed ‖ I).
}
mask := the leftmost maskLen octets of T.

The RSA encryption scheme of PKCS #1, Version 1.5

The older encryption scheme RSAES–PKCS1–v1_5 is no longer recommended, since this scheme is not plaintext-aware, that is, with high probability, an adversary can generate ciphertexts without knowing the corresponding plaintexts. This allows the adversary to mount chosen-ciphertext attacks. The new drafts of PKCS #1 include this old scheme for backward compatibility. Encryption and decryption for RSAES–PKCS1–v1_5 are given in Algorithms 6.12 and 6.13. Here, k is the octet length of the modulus.

Algorithm 6.12. RSA–PKCS1 encryption scheme

Input: The recipient’s public key (n, e) and the message M (an octet string).

Output: The ciphertext C which is an octet string of length k.

Steps:

if (mLen > k – 11) { Return “Error: message too long”. }
Generate a padding string PS of length k – mLen – 3 ≥ 8 octets consisting of
       random non-zero octets.
Generate the encoded message EM := 00 ‖ 02 ‖ PS ‖ 00 ‖ M.

m := OS2I(EM)./* Convert octet string to integer */
c := RSAEP((n, e), m)./* RSA encryption primitive */
C := I2OS(c, k)./* Convert integer back to octet string */

Algorithm 6.13. RSA–PKCS1 decryption scheme

Input: The recipient’s private key K and the ciphertext C (an octet string).

Output: The plaintext message M (an octet string of length ≤ k – 11).

Steps:

if (the length of the ciphertext is not k octets) { Return “decryption error”. }

c := OS2I(C)./* Convert octet string to integer */
m := RSADP(K, c)./* RSA decryption primitive */
EM := I2OS(m, k)./* Convert integer back to octet string */

Try to decompose EM = 00 ‖ 02 ‖ PS ‖ 00 ‖ M, where PS is an octet string of length ≥ 8 and containing only non-zero octets.

if (the above decomposition is unsuccessful) { Return “decryption error”. }

The RSA signature scheme of PKCS #1, Version 1.5

The older RSA signature scheme RSASSA–PKCS1–v1_5 is not known to have security loopholes. (Nevertheless, the provably secure PSS scheme is recommended for future applications.) RSASSA–PKCS1–v1_5 uses EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16). The signature generation and verification procedures are given in Algorithms 6.14 and 6.15. Here, k denotes the octet length of the modulus n.

The EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16) uses a hash function H. Although a member of the SHA family is recommended for future applications, MD2 and MD5 are also supported for compliance with older application. An octet string hashAlgo is used whose value depends on the underlying hash algorithm and is given in Table 6.5.

Table 6.5. The string hashAlgo used by EMSA–PKCS1–v1_5
FunctionThe string hashAlgo
MD230 20 30 0c 06 08 2a 86 48 86 f7 0d 02 02 05 00 04 10
MD530 20 30 0c 06 08 2a 86 48 86 f7 0d 02 05 05 00 04 10
SHA-130 21 30 09 06 05 2b 0e 03 02 1a 05 00 04 14
SHA-25630 31 30 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20
SHA-38430 41 30 0d 06 09 60 86 48 01 65 03 04 02 02 05 00 04 30
SHA-51230 51 30 0d 06 09 60 86 48 01 65 03 04 02 03 05 00 04 40

Algorithm 6.14. RSA–PKCS1 signature generation

Input: The signer’s private key K and the message M to be signed (an octet string).

Output: The signature S (an octet string of length k).

Steps:

Encode M to EM := EMSA–PKCS1–v1_5(M, k)./* Algorithm 6.16 */
m := OS2I(EM)./* Convert octet string to integer */
s := RSASP1(K, m)./* RSA signature generation primitive */
S := I2OS(s, k)./* Convert integer back to octet string */

Algorithm 6.15. RSA–PKCS1 signature verification

Input: The signer’s public key (n, e), the message M (an octet string) and the signature S to be verified (an octet string of length k).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S)./* Convert octet string to integer */
m := RSAVP1((n, e), s)./* RSA signature verification primitive */
EM′ := I2OS(m, k)./* Convert integer back to octet string */
Encode M to EM := EMSA–PKCS1–v1_5(M, k)./* Algorithm 6.16 */

if (EM = EM) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.16. EMSA–PKCS1 encoding

Input: The message M (an octet string), the intended length emLen of the encoded message. One requires emLentLen + 11, where tLen is the octet length of hashAlgo plus the octet length of the hash output.

Output: The encoded message EM (an octet string of length emLen).

Steps:

if (M is longer than what H can handle) { Return “Error: message too long”. }
Compute the hash value mHash := H(M).
Let T := hashAlgo ‖ mHash.
/* Let tLen be the octet length of T*/
if (emLen < tLen + 11) { Return “Error: encoded message length too short”. }
Generate a padding string PS of length emLen – tLen – 3 ≥ 8 octets each
      having the hexadecimal value ff.
Set EM := 00 ‖ 01 ‖ PS ‖ 00 ‖ T.

6.3.2. PKCS #3

PKCS #3 describes the Diffie–Hellman key-exchange algorithm. The draft assumes the existence of a central authority which generates the domain parameters that include a prime p of octet length k, an integer g satisfying 0 < g < p and optionally a positive integer l. The integer g need not be a generator of , but is expected to be of sufficiently large multiplicative order modulo p. The integer l denotes the bit length of the private Diffie–Hellman key of an entity. Values of l ≪ 8k can be chosen for efficiency. However, for maintaining a desired level of security l should not be too small. Since the central authority determines p, g (and l), individual users need not bother about the generation of these parameters.

During a Diffie–Hellman key-exchange interaction of Alice with Bob, Alice performs the steps described in Algorithm 6.17. Bob performs an identical operation which is omitted here.

Algorithm 6.17. PKCS3 Diffie–Hellman key-exchange scheme

Input: p, g and optionally l.

Output: The shared secret SK (an octet string of length k).

Steps:

Alice generates a random .

/* If l is specified, one should have 2l–1 ≤ x < 2l. */

Alice computes y := gx (mod p).

Alice converts y to an octet string PV := I2OS(y, k).

Alice sends the public value PV to Bob.

Alice receives Bob’s public value PV′.

Alice converts PVto the integer y′ := OS2I(PV′).

Alice computes z := (y′)x (mod p) (with 0 < z < p).

Alice transforms z to the shared secret SK := I2OS(z, k).

Chapter Summary

In this chapter, we describe some standards for representation of cryptographic data in various formats and for conversion of data among different formats. We also present some standard encoding and decoding schemes that are applied before encryption and after decryption. These standards promote easy and unambiguous interfaces with the cryptographic primitives described in the previous chapter.

The IEEE P1363 range of standards defines several data types: bit strings, octet strings, integers, prime finite fields, finite fields of characteristic 2, extension fields of odd characteristics, elliptic curves, elliptic curve points and polynomial rings. The IEEE drafts also prescribe standard ways of converting data among these formats. For example, the primitive BS2OS converts a bit string to an octet string, the primitive FE2I converts a finite-field element to an integer.

We subsequently mention some of the public-key cryptography standards (PKCS) propounded by RSA Laboratories. Draft PKCS #1 deals with RSA encryption and signature. In addition to the standard RSA moduli of the form pq, it also suggests possibility of using multi-prime RSA, that is, moduli which are products of more than two (distinct) primes. The draft recommends use of the optimal asymmetric encryption procedure (OAEP). This probabilistic encryption scheme provides provable security against chosen-ciphertext attacks. A probabilistic signature scheme is also advocated for use. These probabilistic schemes call for using a mask-generation function (MGF). A concrete realization of an MGF is also provided. Draft PKCS #3 standardizes the Diffie–Hellman key exchange algorithm.

Suggestions for Further Reading

The P1363 class of preliminary drafts [134] published by IEEE and the PKC standards [254] from RSA Security Inc. are available for free download from Internet sites. However, IEEE’s published standard 1363-2000 is to be purchased against a fee. In addition to the data types and data conversion primitives described in this chapter, the IEEE drafts (P1363, P1363a, P1363.1 and P1363.2) provide encryption/decryption and signature generation/verification primitives and also several encryption and signature schemes based on these primitives. These schemes are very similar to the algorithms that we described in Chapter 5. So we avoided repetition of the same descriptions here. Elaborate encoding procedures are described in the PKCS drafts, but for only RSA-and Diffie–Hellman-based systems. We have reproduced the details in this chapter. The remaining PKCS drafts deal with topics that this book does not directly deal with. A good exception is PKCS #13 that talks about elliptic curve cryptography. This draft is not ready yet; when it is, it may be consulted to learn about the RSA Laboratories’ standards on elliptic-curve cryptography.

At present, the different families of standards do not seem to have mutually conflicting specifications. The IEEE has a (free) mailing list for promoting the development and improvement of the IEEE P1363 standards, via e-mail discussions.

Other Internet Standards include the Federal Information Processing Standards or FIPS [221] from NIST, and RFCs (Request for Comments) from the Internet Engineering Task Force or (IETF) [135].

7. Cryptanalysis in Practice

7.1Introduction
7.2Side Channel Attacks
7.3Backdoor Attacks
 Chapter Summary
 Sugestions for Further Reading

A man cannot be too careful in the choice of his enemies.

—Oscar Wilde (1854–1900), The Picture of Dorian Gray, 1891

If you reveal your secrets to the wind you should not blame the wind for revealing them to the trees.

—Kahlil Gibran (1883–1931)

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

—Charles Antony Richard Hoare

7.1. Introduction

The security of public-key cryptographic protocols is based on the apparent intractability of solving some computational problems. If one can factor large integers efficiently, one breaks RSA. In that sense, seeking for good algorithms to solve these problems (like factoring integers) is part of cryptanalysis. Proving that no poly-time algorithm can break RSA enhances the status of the security of the protocol from assumed to provable. On the other hand, developing a poly-time algorithm for breaking RSA (or for factoring integers) makes RSA (and many other protocols) unusable. A temporary set-back to our existing cryptographic tools as it is, it enriches our understanding of the computational problems. In short, breaking the trapdoors of public-key cryptosystems is of both theoretical and practical significance.

But research along these mathematical lines is open-ended. A desperate cryptanalyst may not wait indefinitely for a theoretical negotiation. She tries to find loopholes in the systems, that she can effectively exploit to gain secret information.

A cryptographic protocol must be implemented (in software or hardware) before it can be used. Careless implementations often supply the loopholes that cryptanalysts wait for. For example, a software implementation of a public-key system may allow the private key to be read only from a secure device (a removable medium, like CDROM), but may make copies of the key in the memory of the machine where the decryption routine is executed. If the decryption routine does not lock and eventually flush the memory holding the key, a second user having access to the machine can simply read off the secrets.

Software and hardware implementations often tend to leak out secrets at a level much more subtle than the example just mentioned. A public-key algorithm is a known algorithm and involves a sequence of well-defined steps dictated by the private key. Each step requires its private share of execution time and power consumption. Watching the decrypting device carefully during a private-key operation may reveal information about the exact sequence of basic steps in the algorithm. Random hardware faults during a private-key operation may also compromise security. Such attacks are commonly dubbed as side-channel attacks.

Let us now look at another line of attack. Every user of cryptography is not expected to implement all the routines she uses. On the contrary, most users run precompiled programs available from third parties. How will a user assess the soundness of the products she is using, that is, who will guarantee that there are no (intentional or unintentional) security snags in the products? The key generation software available from a malicious software designer may initiate a clandestine e-mail every time a key pair is generated. It is also possible that a private key supplied by such a program is generated from a small predefined set known to the designer. Even when private keys look random, they need not come with the desired unpredictability necessary for cryptographic usage. Such attacks during key generation are called backdoor attacks.

In short, public-key cryptanalysis at present encompasses trapdoors, backdoors and side channels. The trapdoor methods have already been discussed in Chapter 4. In this chapter, we concentrate on the other attacks on public-key systems.

7.2. Side-Channel Attacks

Side-channel attacks refer to a class of cryptanalytic tools for determining a private key by measuring signals (like timing, power fluctuation, electromagnetic radiation) from or by inducing faults in the device performing operations involving the private key. In this section, we describe three methods of side-channel cryptanalysis: timing attack, power attack and fault attack.

7.2.1. Timing Attack

Paul C. Kocher introduced the concept of side-channel cryptanalysis in his seminal paper [155] on timing attacks. Though not unreasonable, timing attacks are somewhat difficult to mount in practice.

Details of the attack

The private-key operation in many cryptographic systems (like RSA or discrete-log-based systems) is usually a modular exponentiation of the form

y := xd (mod n),

where d is the private key. The private-key procedure may involve other overheads (like message decoding), but the running time of the routine is usually dominated by and so can be approximated by the time of the modular exponentiation.

Assume that this exponentiation is carried out by a square-and-multiply algorithm known to Carol, the attacker. For example, suppose that Algorithm 3.9 is used. Each iteration of the for loop involves a modular squaring followed conditionally by a modular multiplication. The multiplication is done in an iteration if and only if the corresponding bit ei in the exponent is 1. Thus, an iteration runs slower if ei = 1 than if ei = 0. If Carol could measure the timing of each individual iteration of the for loop, she would correctly guess most (if not all) of the bits in the exponent. But it is unreasonable to assume that an attacker can collect such detailed timing data. Moreover, if Algorithm 3.10 is used, these detailed data do not help much, because in this case the timing of an individual iteration of the for loop can at best differentiate between the two cases ei = 0 and ei ≠ 0. There are 2t – 1 non-zero values for each ei.

However, it is not difficult to think of a situation where the attacker can measure, to a reasonable accuracy, the total time of the exponentiation. In order to guess d, Carol requires the times of the modular exponentiations for several different values of x, say x1, . . . , xk, all known to her. (Note that xi may be messages to be signed or intercepted ciphertexts.) The same exponent d is used for all these exponentiations. Let Ti be the time for computing (mod n), as measured by Carol. We may assume that all these k exponentiations are carried out on the same machine using the same routine.

Kocher considers the attack on the exponentiation routine of RSAREF, a cryptography toolkit available from the RSA Laboratories. This routine implements Algorithm 3.10 with t = 2. For the sake of convenience, the algorithm is reproduced below. We may assume that the exponent has an even number of bits—if not, pad a leading zero.

Algorithm 7.1. RSAREF’s exponentiation routine

Input: , and d = (d2l–1d2l–2 · · · d1d0)2.

Output: y := xd (mod n).

Steps:

 (1)  z1 := x.
 (2)  z2 := z1x (mod n).
 (3)  z3 := z2x (mod n).
 (4)  y := 1.
 (5)  for j = l - 1, . . . , 0 {
 (6)     y := y2 (mod n).
 (7)     y := y2 (mod n).
 (8)     if ((d2j+1d2j)2 ≠ 0) {
 (9)         y := yz(d2j+1d2j)2 (mod n).
(10)     }
(11)  }

Every step of the above algorithm runs in a time dependent on the operands. For example, the modular multiplication in Step (9) takes time dependent on the operands y and z(d2j+1d2j)2. The variation in the timing depends on the implementation of the modular arithmetic routines and also on the machine’s architecture. However, we make the assumption that for fixed operands each step requires a constant time on a given machine (or on identical machines). This is actually a loss of generality, since the running time of a complex step (like modular multiplication or squaring) for fixed operands may vary for various reasons like process scheduling, availability of cache, page faults and so on. It may be difficult, perhaps impossible, for an attacker to arrange for herself a verbatim emulation of the victim’s machine at the time when the latter performed the private-key operations. Let us still proceed with our assumption, say by conceiving of a not-so-unreasonable situation where the effects of these other factors are not sizable enough.

We use the subscript i to denote the i-th private-key operation for 1 ≤ ik. The entire routine takes time Ti for the i-th exponentiation, that is, for the input xi. This measurement may involve some (unknown) error which we denote by ei. The first four steps are executed only once during each call and take a total time of pi (precomputation time). The for loop is executed l times. We ignore the time needed to maintain the loop (like decrementing j) and also the time taken by the if statement in Step (8). Let si,j and ti,j be the times taken respectively by Steps (6) and (7), when the loop variable (j) assumes the value j. If Step (9) is executed, we denote by mi,j the time taken by this step, else we set mi,j := 0. It follows that

Equation 7.1


where the index in the sum decreases from l – 1 to 0 in steps of 1. Carol does not know this break-up (that is, the explicit values of ei, si,j, ti,j and mi,j), but she can make an inductive guess in the following way.

Carol manages a machine and a copy of the exponentiation software both identical to those of the victim. She then successively guesses the secret bit pairs d2l–1d2l–2, d2l–3d2l–4, d2l–5d2l–6 and so on. Assume that at some stage Carol has correctly determined the exponent bits d2j+1d2j for j = l–1, l–2, . . . , j′+1. Initially j′ = l–1. Using this information Carol computes d2j +1d2j as follows. Carol’s knowledge at this stage allows her to measure pi and si,j, ti,j, mi,j for j = l – 1, . . . , j′ + 1 — she simply runs Algorithm 7.1 on xi. Carol then enters the loop with j = j′. The squaring operations are unconditional. Carol has the exact operands as the victim for the squaring steps. So Carol also measures si,j and ti,j.

The bit pair d2j′+1d2j (considered as a binary integer) can take any one of the four values g = 0, 1, 2, 3. Carol measures the time of Step (9) for each of the four choices of g and adds this time to the time taken by the algorithm so far, in order to obtain:

Equation 7.2


Kocher observed that the distribution of Ti, i = 1, . . . , k, is statistically related to that of only for the correct guess g. In order to see how, we subtract Equation (7.2) from Equation (7.1) to get:

Equation 7.3


Let us assume that the error term ei is distributed like a random variable E. Similarly suppose that each multiplication (resp. squaring) has the distribution of a random variable M (resp. S). Taking the variance of Equation (7.3) over the values i = 1, 2, . . . , k and assuming that the sample size k is so large that the sample variances are very close to the variances of the respective random variables, we obtain:

Equation 7.4


where λ denotes the number of times Step (9) is executed for j = j′ – 1, . . . , 0. Note that λ is dependent on the private key and not on the arguments to the exponentiation routine. For the correct guess g, we have and so

On the other hand, for an incorrect guess g we have:

if one of mi,j or is zero, or

if both mi,j and are non-zero. (Recall that Var(αX + βY) = α2 Var(X) + β2 Var(Y) for any real α, β.)

Calculation of the sample variances of for the four choices of g gives Carol a handle to determine (or guess) the correct choice. Carol simply takes the g for which the variance is minimum. This is the fundamental observation that makes the timing attack work.

Of course, statistical irregularities exist in practice, and the approximation of the actual variances by the sample variances introduces errors in Equation (7.4). These errors are of particular concern for large values of j′, that is, during the beginning of the attack. However, if an incorrect guess is made at a certain stage, this is detected soon with high probability, as Carol proceeds further. Suppose that an erroneous guess of d2j″ + 1d2j has been made for some j″ > j′. This means that the values of y are different from the actual values starting from the iteration of the loop with j = j″ – 1. (We may assume that most, if not all, xi ≠ 1.) We then do not have a cancellation of the timings for j = j″ – 1, . . . , j′. More correctly, if the guesses for j = l – 1, . . . , j″ + 1 are correct and the first error occurs at j = j″, then denoting the subsequent timings by one gets

Equation 7.5


Since each of the square and multiplication operations takes y as an operand, the original timings and the measured timings (the ones with hat) behave like independent variables and, therefore, taking the variance of Equation (7.5) yields

for some λ′ depending on the private key and on the previous guesses, but independent of the current guess g. In other words, Carol loses a meaningful relation of Var with the correctness of the current guess. Once Carol notices this, she backtracks and changes older guesses until the expected behaviour is restored. Thus, the timing attack comes with an error detection and correction strategy.

An analysis done by Kocher (neglecting E and assuming normal distributions for S and M) shows that Carol needs k = O(l) for a good probability of success.

Countermeasures

There are several ways in which timing attacks can be prevented.

7.2.2. Power Analysis

In connection with timing attacks, we mentioned that if an adversary were able to measure the timing of each iteration of the square-and-multiply loop during an RSA (or discrete-log-based) private-key exponentiation, she could guess the bits in the key quite efficiently from only a few timing measurements. But it is questionable if such detailed timing data can be made available.

Now, think of a situation where Carol can measure patterns of power consumption made by the decrypting (or signing) device during one or more private-key operations with Alice’s private key. If Alice carries out the private-key operations in her personal workstation, it is difficult for Carol to conduct such measurements. So assume that Alice is using a smart card with a device to which Carol has a control. Carol inserts a small resistor in series with the line which drives Alice’s smart card. The power consumed by the smart-card circuit is roughly proportional to the current through the resistor. Measuring the voltage across the resistor (and multiplying by a suitable factor) Carol can observe the power consumed by Alice’s decryption device. Carol has to use a power measuring device that takes readings at a high frequency (100 MHz to several GHz depending on the budget of Carol). A set of power measurements obtained during a cryptographic operation is called a power trace. We now study how power traces can reveal Alice’s secrets.

Simple power analysis (SPA)

The individual steps in a private-key operation may be nakedly exposed in a power trace. This is, in particular, the case when different steps consume different amounts of power and/or take different times. Obtaining information about the operation of the decrypting device and/or the secrets by a direct interpretation of power traces is referred to as simple power analysis or SPA in short.

As an example of SPA, consider an implementation of RSA exponentiation using the naive square-and-multiply Algorithm 3.9. Here, the most power-consuming operations are modular squaring and modular multiplication. Modular multiplication typically runs slower than modular squaring. Also modular multiplication requires two different operands to fetch from the memory, whereas modular squaring requires only one operand. Thus, a multiplication operation has more and longer power requirements than a squaring operation.

A hypothetical[1] SPA trace during a portion of an RSA private-key operation is shown in Figure 7.1. Each spike in the trace corresponds to either a square or a multiplication operation. Let us assume that the power consumption is measured with sufficient resolution, so that no spike is missed. Since multiplication runs longer (and requires more operands) than squaring, multiplication spikes are wider than squaring spikes.

[1] SPA traces from real-life experiments on smart cards, as reported in several references, look similar to this. We, however, generated the trace using a random number generator. Absolute conformity to reality is not always crucial for the purposes of illustration.

Figure 7.1. Simulated SPA trace for a portion of an RSA private-key operation


Let us denote a squaring operation by S and a multiplication operation by M. We observe that Alice’s smart card performs the sequence

SMSMSSMSSSSMSSSMSS

of operations during the measurement interval shown. Since multiplication in an iteration of the loop is skipped if and only if the corresponding bit in the exponent is zero, we can group the operations as

(SM)(SM)(S)(SM)(S)(S)(S)(SM)(S)(S)(SM)(S)(S.

This, in turn, reveals the bit string 110100010010 in Alice’s private key.

Effective as it appears, SPA, in practice, does not pose a huge threat to the security of conventional cryptographic systems. Using algorithms for which power traces do not bear direct relationships with the bits of the private key largely reduces risks of fruitful SPA. The inefficient repeated square-and-multiply Algorithm 7.2 always performs a multiplication after squaring and thereby eliminates chances of a successful SPA.

Algorithm 7.2. SPA-resistant exponentiation

Input: , and the private key d = (dl–1 · · · d1d0)2.

Output: y := xd (mod n).

Steps:

y := 1.
for (j = l – 1, . . . , 0) {
    t0 := y2 (mod n).
    t1 := t0x (mod n).
    y := tdj.
}

Using the (more efficient) Algorithm 7.1 also frustrates SPA. Some chunks of two successive 0 bits are anyway revealed by power traces collected during the execution of this algorithm. But, for a decently large and random private key, this still leaves Carol with many unknown bits to be guessed. Note, however, that neither of the three remedies suggested to thwart the timing attack on Algorithm 7.1 seems to be effective in the context of SPA. Delays normally do not consume much power (unless some power-intensive dummy computations fill up the delays). Also, the masking of (x, y) by (u, v) fails to produce any alteration in the power consumption pattern during exponentiation.

If some private-key algorithm has unavoidable branchings due to individual bits in the private key, SPA can prove to be a notorious botheration.

Differential power analysis (DPA)

A carefully designed algorithm (like Algorithm 7.2) does not reveal key information from a simple observation of power traces. Moreover, the observed power traces may be corrupted by noise to an extent where SPA is not feasible. In such cases, differential power analysis (DPA) often helps the cryptanalyst reduce the effects of noise and exploit subtle correlation of power consumption patterns with specific bits in the operands. DPA requires availability of power traces from several private-key operations with the same key.

Consider the SPA-resistant Algorithm 7.2. Suppose that k power traces P1(t), . . . , Pk(t) for the computations of (mod n), i = 1, . . . , k, are available to Carol, that the ciphertexts x1, . . . , xk are known to Carol and that d = (dl–1 · · · d1d0)2. Carol successively guesses the bits dl–1, dl–2, dl–3, . . . of the exponent. Suppose that Carol has correctly guessed dj for j = l – 1, . . . , j′ + 1. She now uses DPA to guess dj.

Let e := (dl–1dl–2 · · · dj′ + 1)2. At the beginning of the for loop with j = j′ the variable y holds the value xe modulo n. The loop computes x2e and x2e+1 and assigns y the appropriate value. If dj = 0, then in the next iteration the loop computes x4e and x4e+1, whereas if dj = 1, then in the next iteration the loop computes x4e+2 and x4e+3. It follows that the algorithm handles the value x4e if and only if dj = 0.

For each i = 1, . . . , k, Carol computes (mod n). Carol then chooses a particular bit position (say, the least significant bit) and considers the bit bi of zi at this position. We make the assumption that there is some subsequent step (or substep) in the implementation for which the average power consumption Π0 for b = 0 is different from the average power consumption Π1 for b = 1.[2]

[2] The exact step which exhibits differential bias toward an individual bit value is dependent on the implementation. If the implementation does not provide such a step, the attack cannot be mounted in this way. Initially, the DPA was proposed for DES, a symmetric encryption algorithm, in which such a dependence is clearly available. With asymmetric-key encryption, such a strong dependence of the power, consumed by a step, on an individual bit value is not obvious. One may, however, use other dividing criteria, like low versus high Hamming weight (that is, number of one-bits) in the operand, which bear more direct relationships with power consumption.

Carol partitions {1, . . . , k} into two subsets:

I0:={i | bi = 0},
I1:={i | bi = 1}.

Carol computes the average power traces and and subsequently the differential power trace

First, let dj = 0. In this case, the routine handles and so the power consumption at some time τ is correlated to the bit bi of . At any other instant, the power consumption is uncorrelated to this particular bit value. Therefore, if the sample size is sufficiently large and if the measurement noise has mean at zero, we have:

On the other hand, if dj = 1, the value never appears in the execution of the algorithm and so at every time t the power consumption is uncorrelated to the particular bit of and so we expect

Δ(t) ≈ 0for all t.

Figure 7.2 illustrates the two cases.[3] If the differential power trace has a distinct spike, the guess dj = 0 is correct. So by observing the existence or otherwise of a spike, Carol determines whether dj = 0 or dj = 1.

[3] Once again, these are hypothetical traces obtained by random number generators.

Figure 7.2. Simulated DPA trace for a portion of an RSA private-key operation

(a) for the correct guess
(b) for an incorrect guess


The number k of samples required for a good probability of success depends on the bias Π1–Π0 relative to the measurement noise. We assume that . If the noise has a variance of σ2, then by the central limit theorem the noise in each average power trace or has at each t an approximate variance 2σ2/k, and so in the differential power trace Δ(t) the noise has an approximate variance 4σ2/k. In order that the bias Π1 –Π0 stands out against the noise, we require , say, , that is, k ≥ 64σ2/(Π1 – Π0)2.

Countermeasures

Several countermeasures can be adopted to prevent DPA, both in the software level and in the hardware level.

Paul Kocher asserts: DPA highlights the need for people who design algorithms, protocols, software, and hardware to work closely together when producing security products.

7.2.3. Fault Analysis

We finally come to the third genre of side-channel cryptanalysis. We investigate how hardware faults occurring during private-key operations can reveal the secret to an adversary. There are situations where a single fault suffices. Boneh et al. [30] classify hardware faults into three broad categories.

  1. Transient faults These are faults caused by random (unpredictable) hardware malfunctioning. These may be the outcomes of occasional flips of bit values in registers or of temporary erroneous outputs from logic or arithmetic circuits in the processor. These faults are called transient, because they are not repeated. It is rather difficult to detect such (silent) faults.

  2. Latent faults These are faults generated by some permanent malfunctioning and/or bugs inherent in the processor. For example, the floating-point bug in the early releases of the Pentium processor may lead to latent faults. Latent faults are permanent, that is, repeated, but may be difficult to locate in practice.

  3. Induced faults An induced fault is deliberately caused by an adversary. For example, a short surge of electromagnetic radiation may cause a smart card to malfunction temporarily. A malicious adversary can induce such temporary hardware faults to extract secret information from the smart card. It is, however, difficult to induce deliberate faults in a remote workstation.

Although induced faults appear to be the ones to guard against most seriously, the other two types of faults are also of relevance. Consider a certifying authority signing many messages. Transient and/or unknown latent faults may reveal the authority’s private key to a user who can later utilize this knowledge to produce false certificates.

Fault attack on RSA based on CRT

Consider the implementation of RSA private-key operation based on the CRT combination of the values obtained by exponentiation modulo the prime divisors p and q of the modulus n (Algorithm 5.4). Suppose that m is a message to be signed and s := md (mod n) the corresponding signature, where d is the signer’s private key. The CRT-based implementation computes s1 := s (mod p) and s2 := s (mod q). Assume that due to hardware fault(s) exactly one of s1 and s2 is wrongly computed. Say, s1 is incorrectly computed as . The corresponding faulty signature is denoted by . We assume that the CRT combination of and s2 is correctly computed.

An adversary requires the faulty signature and the correct signature s on the same message m in order to obtain the factor q of n. To see how, note that (mod p), ss1 (mod p) and (mod p), so that (mod p), that is, . On the other hand, (mod q), that is, . Therefore,

This is how the fault analysis of Boneh et al. [30] works.

Arjen K. Lenstra et al. [142] point out that the knowledge of the faulty signature alone reveals the secret divisor q, that is, one does not require the genuine signature s on m. The verification key e of the signer is publicly known. Since RSA exponentiation is bijective, (mod n). However, (mod q), and so (mod p). It follows that

Fault attack on RSA without CRT

Now, consider an implementation of RSA decryption based on a single exponentiation modulo n. For such an implementation, several models of fault attacks have been proposed. These attacks are less practical than the attack on CRT-based RSA just mentioned, because now one requires several faulty signatures in order to deduce the entire private key. Here, we present an attack due to Bao et al. [17].

As usual, the RSA modulus is n = pq and the signer’s key pair is (e, d). Consider a valid signature s on a message m. Let d = (dl–1 · · · d1d0)2 be the binary representation of the private key. Consider the powers:

sim2i (mod n)for i = 0, 1, . . . , l – 1.

The signature s can be written as:

We assume that the attacker knows m and s and hence can compute si and modulo n for i = 0, . . . , l – 1. There is no harm in assuming that the message m is randomly chosen. (We may assume that randomly chosen integers are invertible modulo n, because encountering a non-invertible non-zero integer by chance is a stroke of unimaginable good luck and is tantamount to knowing the factors of n.)

In order to guess a bit of d, the attacker induces a fault in exactly one of the bits dj, changing it from dj to . The position j is random, that is, not under the control of the attacker. Now, the algorithm outputs the faulty signature

and so

A repetition in the values sl–1, . . . , s0, , . . . , modulo n is again an incident of minuscule probability. Hence the attacker can uniquely identify the bit position j and the bit value dj in d by comparing with these 2l values.

Statistical analysis implies that the attacker needs to repeat this procedure about l log l times (on same or different (m, s) pairs) in order to ensure that the probability of identifying all the bits of d is at least 1/2.

Fault attack on the Rabin digital signature algorithm

Recall from Algorithm 5.34 that the Rabin signature algorithm uses CRT to combine s1 (mod p) and s2 (mod q). Thus, the attack on CRT-based RSA, described earlier, is applicable mutatis mutandis to the Rabin signature scheme. The computation of the square roots s1 and s2 demands the major portion of the running time of the routine. Inducing a fault during the execution is, therefore, expected to affect exactly one of s1 and s2, as desired by the attacker.

Fault attack on DSA

Bao et al. [17] propose a fault attack on the digital signature algorithm (DSA). We work with the notations of Algorithm 5.43 and Algorithm 5.44, except that, for maintaining uniformity in this section, we use m (instead of M) to denote the message to be signed. The (public) parameters are p, a prime divisor r of p – 1 of length 160 bits and an element of multiplicative order r. The signer’s DSA key pair is (d, gd(mod p)) with 1 < d < r.

Suppose that during the generation of a DSA signature, an attacker induces a fault in exactly one bit position of d changing it to . The routine generates the faulty signature , where

(d′, gd) being the session key pair (not mutilated). As in the DSA signature-verification scheme, the attacker computes the following:

For each i = 0, . . . , l – 1 (where the bit length of d is l), the attacker also computes

Assume that the j-th bit dj of d is altered. If dj = 0, and so

On the other hand, if dj = 1, then and a similar calculation shows that

Thus, the attacker computes and for all j = 0, . . . , l – 1 and notices a unique match (with s). This discloses the position j and the corresponding bit dj.

Fault attack on the ElGamal signature scheme

A fault attack similar to that on the DSA scheme can be mounted on the ElGamal signature scheme. We here propose an alternative method proposed by Zheng and Matsumoto [315]. The novelty in their approach is that it performs the cryptanalysis of the ElGamal signature scheme by inducing fault on the pseudorandom bit generator of the signer’s smart card.

Algorithms 5.36 and 5.37 describe the ElGamal signature scheme on a general cyclic group G. Here, we restrict our attention to the specific group (though the following exposition works perfectly well for a general G). The parameters are a prime modulus p and a generator g of . The signer’s key-pair is (d, gd(mod p)) for some d, 2 ≤ dp – 2.

In order to generate a signature (s, t) on a message m, a random session key d′ is generated and subsequently the following computations are carried out:

sgd (mod p),
td–1(H(m) – dH(s)) (mod p – 1).

Zheng and Matsumoto attack the generation of the session key d′. They propose the possibility that an abnormal physical stress (like low voltage) forces a constant output d0 for d′ from the pseudorandombit generator (software or hardware) in the smart card. First, assume that this particular value d0 is known a priori to the attacker. She then lets a message m generate a signature (s, t) with the session secret d0. The private key d is then immediately available from the equation:

dH(s)–1(H(m) – d0t) (mod p – 1).

Here, we assume that H(s) is invertible modulo p – 1.

If d0 is not known a priori, the attacker generates two signatures (s1, t1) and (s2, t2) on messages m1 and m2 respectively. Since d′ is always d0, we have s1 = s2 = s0, say. One can then easily calculate

d0 ≡ (t1t2)–1(H(m1) – H(m2)) (mod p – 1),

which, in turn, yields

dH(s0)–1(H(m1) – d0t1) (mod p – 1).

Fault attack on the Feige–Fiat–Shamir identification protocol

Let us conclude our repertoire of fault attack examples by explaining an attack on the FFS zero-knowledge identification protocol. This attack is again from Boneh et al. [30].

We use the notations of Algorithm 5.69. A modulus n = pq, p, , is first chosen (by Alice or by a trusted third party). Alice selects random x1, . . . , and random bits δ1, . . . , δt, computes (mod n), publishes (y1, . . . , yt) and keeps (x1, . . . , xt) secret.

During an identification session with Bob, Alice generates a random commitment and sends to Bob the witness w := c2 (mod n). (For simplicity, we take γ of Algorithm 5.69 to be 0.) When Alice is waiting for a challenge from Bob, a fault occurs in her smart card changing the commitment c to c + E. Assume that the fault is at exactly one bit position, that is, E = ±2j for some , l being the bit length of c (or of n). This fault may be purposely induced by Bob with the malicious intention of guessing Alice’s secret (x1, . . . , xt).

Bob then generates a random challenge as usual. Upon reception of this challenge Alice computes and sends to Bob the faulty response

The knowledge of now aids Bob to obtain the product as follows. First, note that

so that

for some .

There are only 4l possible values of (E, δ). Bob tries all these possibilities one by one. To simplify matters we assume that only one value of (E, δ) with E of the special form ±2j and with satisfies the last congruence. In practice, the existence of two (or more) solutions for (E, δ) is an extremely improbable phenomenon. For a guess of (E, δ), the commitment c can be computed as

The correctness of the guess (E, δ) can be verified from the relation wc2 (mod n). Bob can now compute the desired product

In order to strengthen the confidence about the correctness of T, Bob may repeat the protocol once more with the same values of ∊1, . . . , ∊t, but under normal conditions (that is, without faults). This time he obtains w′ ≡ (c′)2 (mod n) and r′ ≡ cT (mod n), which together give (r′)2wT2 (mod n), a relation that proves the correctness of T.

Bob repeats the above procedure t times in order to generate the system:

Equation 7.6


Here, ∊ki and Tk are known to Bob. Moreover, the exponents ∊ki can be so selected that the matrix (∊ki) is invertible modulo 2. In order to determine x1, Bob tries to find satisfying

for some integers v1, . . . , vt. Comparing the coefficients gives the linear system

which can be solved for u1, . . . , ut, since the matrix (∊ki) is invertible modulo 2. The solution gives v1, . . . , vt and hence

Similarly, x2, . . . , xt can be determined up to sign. Plugging in these values of xi in System (7.6) and solving another linear system modulo 2 gives the exact signs of all xi.

Notice that Bob could have selected ∊ki = δki (where δ is the Dirac delta). For this choice, System (7.6) immediately gives x1, . . . , xt. But, in practice, Alice may disagree to respond to such simplistic challenges. Moreover, Bob must not raise any suspicion about a possible malpractice. For a general choice, all Bob has to do additionally is a little amount of simple linear algebra. The parameter t is rather small (typically less than 20); so this extra effort is of little concern to Bob.

Countermeasures

Fault analysis could be a serious threat, especially to smart-card users and certification authorities. We mention here some precautions to guard against such attacks. Some of these work for a general kind of fault attack, the others are specific to the algorithms they plan to protect.

Exercise Set 7.2

7.1Consider the notations of Section 7.2.1. Assume that mi,j is constant for all i, j (and irrespective of d2j+1d2j), but the square times si,j and ti,j vary according to their operands. Device a timing attack on such a system.
7.2Show that under reasonable assumptions the SPA-resistant Algorithm 7.2 can be crypt-analyzed by timing attacks.
7.3Recall that SPA of Algorithm 7.1 may leak partial information on the private key (some 00 sequences in the key). Rewrite the algorithm to prevent this leakage.
7.4Assume that in Bao et al.’s attack on RSA described in the text, the attacker can induce faults in exactly two bit positions of d. Suggest how the two bits of d at these positions can be revealed from the resulting faulty signature.
7.5Consider a variant of the Bao et al.’s attack on RSA described in the text, in which the valid signature s on m is unknown to the attacker. Explain how the position j of the erroneous bit and the bit dj at this position can still be identified. [H]
7.6Bao et al. [17] propose an alternate fault analysis on RSA with square-and-multiply exponentiation. Use the notations (n, e, d, m, s, si) as in the text. Assume that the attacker knows an (m, s) pair and can induce a fault in exactly one of the values sj (and nowhere else) and generate the corresponding faulty signature. Suggest a strategy how the position j and the bit dj can be recovered in this case.
7.7Propose a fault attack on the ElGamal signature scheme (Algorithms 5.36 and 5.37), similar to the attack on DSA described in the text.

7.3. Backdoor Attacks

Backdoor attacks on a public-key cryptosystem refer to attacks embedded in the key generation procedure (hardware or software) by the designer of the procedure. A contaminated cryptosystem is one in which the key generation procedure comes with hidden backdoors. A good backdoor attack should meet the following criteria:

Young and Yung [307] have proposed using public-key cryptography itself for generating backdoors. In their schemes, the attacker (the designer) embeds the encryption routine and the encryption key of the attacker in the key generation procedure of the contaminated system. The decryption key of the attacker is not embedded in the contaminated system and is known only to the attacker. The attacker’s encryption system is assumed to be honest and unbreakable and, thereby, it gives the attacker the exclusive power to decrypt contaminated keys. Young and Yung call such a backdoor a secretly embedded trapdoor with universal protection (SETUP). They also coined the term kleptography to denote such use of cryptography against cryptography.

In the rest of this section, we denote the attacker’s encryption and decryption functions by fe and fd respectively. We often do not restrict these functions to public-key routines only. Since public-key routines are slow, symmetric-key routines can be employed in practice. Simple XOR-ing with a fixed bit string (known to the designer) may also suffice. However, for these faster alternatives of fe, fd, reverse engineering reveals the symmetric key or the XOR operand to the user who can subsequently mimic the attacker to steal keys generated elsewhere by the same contaminated system.

We use the following shorthand notations. Here, n stands for a positive integer that can be naturally identified with a unique bit string having the most significant (that is, leftmost) bit equal to 1.

|n|=the bit length of n.
lsbk(n)=the least significant k bits of n.
msbk(n)=the most significant k bits of n.
(a1a2 ‖ · · · ‖ ar)=the concatenation of the bit strings a1, a2, . . . , ar.

7.3.1. Attacks on RSA

RSA, (seemingly) being the most popular public-key cryptosystem, has been the target of most cryptanalytic attacks. Backdoor attacks are not an exception. The backdoor attacks on RSA work by cleverly hiding some secret information in the public key (n, e) of a user. As earlier, we denote the corresponding private exponent by d and the prime factors of n by p and q.

Hiding prime factor

The simplest attack is to choose a fixed p known to the designer. The other prime q is generated randomly, and correspondingly n = pq and the key pairs (e, d) are computed. Reverse engineering such a scheme is pretty simple, since two different moduli n1 = pq1 and n2 = pq2 belch out p = gcd(n1, n2) easily.

A better approach is given in Algorithm 7.3. The function fe may be RSA encryption under the designer’s public key. In that case, the RSA modulus of the attacker should be so chosen that the condition e < n is satisfied with good probability. On the other hand, if this modulus is too small, then this scheme will generate values of e much smaller than n.

In order to determine the secret exponent from a public key generated using this scheme, the attacker runs Algorithm 7.4. If fe and fd are RSA functions under the attacker’s keys, nobody other than the attacker can apply fd to generate p from e. This provides the designer with the exclusive capability of stealing keys.

A problem with Algorithm 7.3 is that the attacker has little control over the length of the public exponent e. If the user demands a small modulus (like e = 3 or e = 257), this scheme fails to produce one. Algorithm 7.5 overcomes this difficulty by hiding p in the high order bits of the modulus n (instead of in the exponent e). Young and Yung [307] proposed this algorithm in the name PAP (pretty awful privacy). The name contrasts with PGP (pretty good privacy), a popular and widely used RSA implementation.

Algorithm 7.3. A simple backdoor attack on RSA

Input:

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

Generate a random k-bit prime q.
while (1) {
    Generate a random k-bit prime p.
    n := pq.
    e := fe(p).
    if ((e < n) and (gcd(e, φ(n)) = 1)) {
        Compute d with ed ≡ 1 (mod φ(n)).
        Return (ned).
    }
}

Algorithm 7.4. Retrieving the secret exponent

Input: An RSA public key (n, e).

Output: The corresponding secret (p, q, d) or failure.

Steps:

p := fd(e).
if (p|n) {
    q := n/p.
    φ := (p – 1)(q – 1).
    d := e–1 (mod φ).
    Return (pqd).
else {
    /* The key is not generated by Algorithm 7.3 */
    Return failure.
}

Algorithm 7.5 works as follows. Following Young and Yung [307], we assume that the attacker uses RSA to realize fe and fd. The RSA modulus of the attacker is denoted by N. The attack requires |N| = k, where |p| = |q| = k. To start with, a random prime p of the desired bit length k is generated. This prime is to be encrypted using fe and so one requires p < N. Instead of encrypting p directly, the attacker uses a permutation function π keyed by K + i for some fixed K and for i = 1, 2, . . . , B, where B is a small bound (typically B = 16). This permutation helps the attacker in two ways. First, one may now have p > N, so a suspicion regarding bounded values of p does not arise. Second, it is cheaper to apply the permutation instead of generating fresh candidates for p. (In an (honest) RSA key generation routine, the prime generation part typically takes the most of the running time.)

Algorithm 7.5. Backdoor attack on RSA: Young and Yung’s PAP scheme

Input: .

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

while (1) {
    /* Try to generate a suitable p */
    Generate a random k-bit prime p.
    i = 1.
    while (i ≤ B) {
        p′ := πK+i(p).    /* Use a keyed permutation πK+i*/
        if (p′ < N) { break } else { i++ }
    }

    /* Try to generate n and q */
    if (i ≤ B) {
        p″ := fe(p′).  /* Encrypt p′ by the designere’s public key */
        j := 1.
        while (j ≤ B) {
            .   /*  is a keyed permutation and |p‴| = k or k – 1. */
            Generate a pseudorandom bit string a of length k.
            X := (p‴ ‖ a).
            q := X quot p.
            if (|q| = k) and (q is prime) {
                n := pq.
                e := 17.
                while (gcd(e, φ(n)) ≠ 1) { e + = 2. }
                d := e–1 (mod φ(n)).
                Return (ned).
            } else { j ++ }
        }
    }
}

Once a suitable p and the corresponding p′ = πK+i(p) are generated, the encryption function fe is applied to generate p″ = fe(p′). Now, instead of embedding p″ directly in the modulus n, another keyed permutation is applied on p″ to generate . This permutation facilitates investigating several choices for q and so is a faster alternative than restarting the entire process afresh, every time an unsuitable q is computed. A pseudorandom bit string a of length k is appended to p‴ to obtain an approximation X for n. If q := ⌊X/p⌋ happens to be a prime of bit length k, the exact n = pq is computed, else another j is tried. If all values of (for some small bound B′) fail, the entire procedure is repeated with a new k-bit prime p.

For random choices of a, the quotients q = ⌊X/p⌋ behave like random integers and so the probability that q is prime is almost the same as random integers of bit length k. Write X = qp + r with r = X rem p. If r > a, then n = Xr has p‴ – 1 embedded in its higher bits, whereas if ra, then p‴ itself is embedded in the higher bits of n.

Once suitable p and q are found, the PAP routine generates (like PGP) a small encryption exponent e relatively prime to φ(n) and its inverse d modulo φ(n). One can anyway opt for bigger values of e. In that case, instead of choosing e successively from the sequence 17, 19, 21, 23, . . . one writes one’s customized steps for generating candidate values for e. Choosing small e in Algorithm 7.5 indicates resemblance with PGP and the flexibility of doing so.

The authors of PAP compare their implementation of Algorithm 7.5 with that of the honest PGP key generation procedure. The contaminated routine has been found to run on an average only 20 per cent slower than the honest routine.

Algorithm 7.6 recovers the prime factor p of n from a public key (n, e) generated by PAP, using the RSA decryption function fd of the attacker. Reverse engineering may make available to the user the permutation functions π and π′, the fixed constants K, B, B′ and the designer’s public key. But this knowledge alone does not empower the user to steal PAP-generated keys.

Algorithm 7.6. Retrieving the prime divisor

Input: An RSA public key (n, e) with n = pq.

Output: The prime divisor p of n or failure.

Steps:

Write n = (U ‖ Vwith |V | = k.
for 
    for j = 1, 2, . . . , B′ {
        
        p′ := fd(p″).
        for i = 1, 2, . . . , B {
            p := (πK+i)–1(p′).
            if (p|n) { Return p. }
        }
    }
}
/* (neis not generated by Algorithm 7.5 */
Return failure.

Hiding small private exponent

Another possible backdoor is hiding an RSA key pair (∊, δ) with small δ inside a key pair (e, d). Crépeau and Slakmon [70] realize this backdoor using a result from Boneh and Durfee [32], which describes a polynomial-time (in |n|) algorithm for computing δ from the public key (n, ∊), provided that δ is less than n0.292. This attack is explained in Algorithm 7.7. Here, the modulus is a genuine random RSA modulus. The mischievous key ∊ is neatly hidden by the attacker’s encryption routine fe. The resulting output key pair (e, d) looks reasonably random. However, this scheme has a drawback similar to Algorithm 7.3; that is, it cannot easily generate small values of e.

Algorithm 7.7. Backdoor attack on RSA: small private exponent

Input: .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Generate random primes pq of bit length ~ k/2, such that n := pq has |n| = k.
do {
   Generate random  with gcd(δ, φ(n)) = 1 and |δ| < 0.292|n|.
   ∊ := δ–1 (mod φ(n)).
   e := fe(∊).    /* Hide ∊ */
} while (gcd(e, φ(n)) ≠ 1).
d := e–1 (mod φ(n)).
Return (ned).

Algorithm 7.8 retrieves d from a public key (n, e) generated by Algorithm 7.7.

Algorithm 7.8. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.7.

Output: The corresponding private key d.

Steps:

∊ := fd(e).     /* Recover the hidden exponent */
Use Boneh and Durfee’s algorithm to recover δ ≡ ∊–1 (mod φ(n)).
Use ∊ and δ to compute φ(n).
Compute d ≡ e–1 (mod φ(n)).

The correctness of Algorithm 7.8 is evident. In order to see how the knowledge of ∊ and δ reveals φ(n), note that x := ∊δ – 1 is a multiple of φ(n); that is,

Equation 7.7


for some integer l. Since δ < n0.292 and ∊ < n, we have x < n1.292. But φ(n) ≈ n and so l cannot be much larger than n0.292. Since |p| ≈ k/2 ≈ |q|, we have l(p+q–1) < n. Now, if we write

x = an + b = (a + 1)n – (nb)

with a = x quot n and b = x rem n, comparison with Equation (7.7) reveals that l = a + 1. This gives φ(n) = x/l.

Although not needed explicitly here, the factorization of n can be easily obtained by solving the equations pq = n and p + q = n – φ(n) + 1. If ∊ and δ are not small, we may have l(p + q – 1) ≥ n, and φ(n) cannot be calculated so easily as above. A randomized polynomial-time algorithm can still factor n from the knowledge of ∊, δ and n. For the details, solve Exercise 7.9.

Hiding small public exponent

Crépeau and Slakmon propose another backdoor attack based on the following result due to Boneh et al. [33]. Let (∊, δ) be a key pair for an RSA modulus n = pq. Further, let and 2t–1 ≤ ∊ < 2t. There exists a polynomial-time algorithm that, given n, ∊, and t most significant and |n|/4 least significant bits of δ, recovers the full private exponent δ.

Algorithm 7.9. Backdoor attack on RSA: small public exponent

Input: and .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Generate random primes pq of bit length ~ k/2, such that n := pq has |n| = k.
do {
   Generate random  with gcd(∊, φ(n)) = 1 and |∊| = t.
   δ := ∊–1 (mod φ(n)).
   .
}
while (gcd(e, φ(n)) ≠ 1).
d := e–1 (mod φ(n)).
Return (ned).

Algorithm 7.9 uses fe to hide in e a small ∊, t most significant bits of δ and |n|/4 least significant bits of δ. A string of bit length 2t + k/4 is encrypted by fe. Applying the decryption routine fd on e recovers these hidden values, from which ∊ and δ and hence φ(n) can be obtained. Algorithm 7.10 does this task. This scheme also fails, in general, to produce small public exponents e.

Algorithm 7.10. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.9 and the matching .

Output: The corresponding private key d.

Steps:

Compute fd(eand retrieve the following:
   (a) the hidden public exponent ∊,
   (b) the t most significant bits of the hidden private exponent δ and
   (c) the |n|/4 least significant bits of δ.
Apply the Boneh-Durfee-Frankel algorithm to recover δ completely.
Use ∊ and δ to compute φ(n).       /* See Exercise 7.9 */
Compute d ≡ e–1 (mod φ(n)).

7.3.2. An Attack on ElGamal Signatures

We now describe a backdoor attack on the ElGamal signature Algorithm 5.36. This attack does not work when the user’s permanent key pair is generated. It manipulates the session-key generation in such a way that the user’s permanent private key is revealed to the attacker from two successive signatures.

Let p be a prime, g a generator of , and (d, gd(mod p)) the permanent key pair of Alice. The attacker uses the same field and a key pair (D, gD (mod p)) with gD supplied to the signing device. Suppose that Alice signs two messages m1 and m2 to generate signatures (s1, t1) and (s2, t2) respectively, where

The attack proceeds by letting d1 arbitrary, but by taking

d2 ≡ (gD)d1 (mod p).

Since , we have

that is,

The private key D of the attacker (or d1) is required for computing d; so nobody other than the designer can retrieve Alice’s secret by observing the contaminated signatures (s1, t1) and (s2, t2).

7.3.3. An Attack on ElGamal Encryption

For ElGamal encryption (Algorithm 5.15) and for Diffie–Hellman key exchange (Algorithm 5.27) over , a party (Alice) generates random session key pairs of the form (d′, gd(mod p)) and communicates the public session key gd to another party. The following backdoor manipulates the session-key generation in such a way that two public session keys reveal the second private session key (but not the permanent private key). We assume that the attacker learns the public session keys by eavesdropping. The attacker’s key-pair is (D, gD(mod p)). The contaminated routine contains the public key gD(mod p), but not the private key D.

Let (d1, r1) and (d2, r2) be two session keys used by Alice, where

r1gd1 (mod p),
r2gd2 (mod p).

The contaminated routine that generates the session keys uses a fixed odd integer u, a hash function H and a random bit to generate d2 from d1 as follows:

zgd1+ub(gD)d1 (mod p),
d2H(z) (mod p – 1).

The attacker knows r1 and r2 by eavesdropping. She computes d2 by Algorithm 7.11, the correctness of which is established from that .

Algorithm 7.11. Backdoor attack on ElGamal encryption

.                                                                     /* corresponding to b = 0 */
if (r2 ≡ gH(z0) (mod p)) { Return H(z0). }
z1 := z0gu (mod p).                                                                   /* corresponding to b = 1 */
if (r2 ≡ gH(z1) (mod p)) { Return H(z1). }
Return failure.              /* The attackeres routine was not used for key generation. */

Algorithm 7.11 requires the attacker’s private key D (or d1) and can be performed only by the attacker. Now, d2 can be analogously used to generate the third session key d3 and so on, that is, the attacker can steal all the private session keys (except the first).

The odd integer u is used for additional safety. In order to see what might happen without it (that is, with b = 0 always), assume that H can be inverted. This gives z and (mod p). If D is even, y is always a quadratic residue modulo p. If D is odd, y is a quadratic residue or non-residue modulo p depending on whether d1 is even or odd. The randomly added odd bias u destroys this correlation of z with quadratic residues.

7.3.4. Countermeasures

Using trustworthy implementations (hardware or software) of cryptographic routines (in particular, key generation routines) eliminates or reduces the risk of backdoor attacks. Preferences should be given to software applications with source codes (rather than to the more capable ones without source codes). Random number generators should be given specific attention. Cascading products from different independent sources also minimizes the possibility of hidden backdoors.

If the desired grain of trust is missing from the available products, the only safe alternative is to write the codes oneself. Complete trust on cryptographic devices and packages and using them as black boxes without bothering about the internals is often called black-box cryptography. Users should learn to question black-box cryptography. The motto is: Be aware or bring peril.

Exercise Set 7.3

7.8Argue that reverse engineering the PAP routine (Algorithm 7.5) can enable a user to distinguish in polynomial time between key pairs generated by PAP and those generated by honest procedures.
7.9Let n = pq be an RSA modulus and (e, d) a key pair under this modulus. Write ed – 1 = 2st, where s = v2(ed – 1) (so that t is odd). Since ed – 1 is a multiple of φ(n) = (p – 1)(q – 1) with odd p, q, we have s ≥ 2.
  1. Show that for any the multiplicative order ordn(at) divides 2s. [H]

  2. Let be such that at has different orders modulo p and modulo q. Show that gcd(a2σt – 1, n) is a non-trivial divisor of n for some .

  3. Let g be a generator of . Take a := gk (mod p) for some and let ordp(at) = 2σ. Show that σ = v2(p – 1) if k is odd, and σ < v2(p – 1) if k is even. [H] An analogous result holds for the other prime q.

  4. Demonstrate that there are at least φ(n)/2 elements a in with the property that at has different orders modulo p and q. [H]

  5. Suggest a randomized poly-time algorithm for factoring n from the knowledge of n, e and d.

Chapter Summary

In this chapter, we discuss some indirect ways of attacking public-key cryptosystems. These attacks do not attempt to solve the underlying intractable problems, but watch the decryption device and/or use malicious key generation routines in order to gain information about private keys.

The timing attack works based on the availability of the total times of several private-key operations under the same private key. It successively keeps on guessing bits of the private key by performing some variance calculations.

The power attack requires the availability of the power consumption patterns (also called power traces) of the decrypting (or signing) device during one or more private-key operations. If the measurements are done with good accuracy and resolution, a single power trace may reveal the private key to the attacker; this is called simple power analysis. In practice, however, such power measurements are often contaminated with noise. Differential power analysis requires power traces from several decryption operations under the same private key. The different traces are combined using a technique that reduces the effect of noise.

A fault attack can be mounted by injecting one or more faults in the device performing private-key operations. Fault attacks are discussed in connection with several encryption (RSA), signature (ElGamal, DSA and so on) and authentication (FFS) schemes.

The above three kinds of attacks are collectively called side-channel attacks. Several general and algorithm-specific countermeasures against side-channel attacks are discussed.

Backdoor attacks, on the other hand, are mounted by malicious key generation routines. Young and Yung propose the concept of secretly embedded trapdoor with universal protection (SETUP). In a SETUP-contaminated system, the designer of the key generation routine possesses the exclusive right to steal keys from users. Several examples of backdoor attacks on RSA and ElGamal cryptosystems are described.

Suggestions for Further Reading

Kocher introduces the concept of side-channel attacks in his seminal paper [155]. This paper describes further details about the timing attack (like a derivation of the choice of the sample size k) and some experimental results.

Timing attacks in various forms are applicable to other systems. Kocher [155] himself suggests a chosen message attack on an RSA implementation based on CRT (Algorithm 5.4). Carol, in an attempt to guess Alice’s public key d, tries to guess the factor p (or q) of the modulus n using a timing attack. She starts by letting Alice sign a message y (c in Algorithm 5.4) close to an initial guess of p. The CRT-based algorithm first reduces y modulo p and modulo q before performing the modular exponentiations. If y < p already, then the initial reduction modulo p returns (almost) immediately, whereas if yp, the reduction involves at least one subtraction. This gives a variation in the timings based on the value of p. This fact is exploited by the attack to arrive at better and better approximations of p.

A known-message timing attack (in addition to the chosen message attack mentioned in the last paragraph) on the CRT-based RSA signature scheme is proposed by Kocher in the same paper [155]. Kocher also explains a timing attack on the signature algorithm DSA (Algorithm 5.43), based on the dependence of the modular reduction of H(M) + ds modulo r on the bits of the signer’s private key d.

Large scale implementations of timing attacks are reported in the technical reports [77, 259] from the Crypto group of Université catholique de Louvain. These implementations study Montgomery exponentiation.

Kocher [155] mentions the possibility of power attacks. However, a concrete description is first published in Kocher et al. [156], which explains both SPA and DPA. DES is the basic target of this paper, though possibilities for using these techniques against public-key systems are also mentioned.

Several variants of the basic DPA model described in the text have been proposed. Messerges et al. [200] describe attacks against smart-card implementations of exponentiation-based public-key systems. Also consult Aigner and Oswald’s tutorial [9] for a recent survey.

DPA seems to be the most threatening of all side-channel attacks. Many papers suggesting countermeasures against DPA have appeared. Chari et al. [45] propose a masking method. Messerges [199] applies this idea to a form suitable for AES.[4] Messerges’ countermeasure is broken in [63] using a multi-bit DPA. Some other useful papers on DPA include [10, 55, 201].

[4] AES is an abbreviation for advanced encryption standard which is a US-government standard that supersedes the older standard DES. AES uses the Rijndael cipher [219].

Boneh et al. [30, 31] from the Bellcore Lab. announce the first systematic study of fault attacks on asymmetric-key cryptosystems. They explain fault attacks on RSA (with and without CRT), the Rabin signature scheme, the Feige–Fiat–Shamir identification protocol and on the Schnorr identification protocol. These attacks are collectively known as Bellcore attacks.

Arjen K. Lenstra points out that the fault attack on CRT-based RSA does not require the valid signature. Joye and Quisquater propose some generalizations of the Bellcore–Lenstra attack. A form of this attack is applicable to elliptic-curve cryptosystems. The paper [142] talks about these developments.

Bao et al. [17] propose fault attacks on DSA, ElGamal and Schnorr signatures. They also describe variants of the fault analysis of RSA based on square-and-multiply algorithms. Zheng and Matsumoto [315] indicate the possibilities of attacking the random bit generator in a smart card.

Biham and Shamir [22] investigate fault analysis of symmetric-key ciphers and introduce the concept of differential fault analysis. Anderson and Kuhn [11] also study fault analysis of symmetric-key ciphers. Aumüller et al. [15] publish their practical experiences regarding physical realizations of faults in smart cards. They also suggest countermeasures against such attacks.

James A. Muir’s work [215] is a very readable and extensive survey on side-channel cryptanalysis. Also look at Boneh’s survey [29].

Because of small key sizes, elliptic-curve cryptosystems are very attractive for implementation in smart cards. It is, therefore, necessary to provide effective countermeasures against side-channel attacks (most importantly, against the DPA) for elliptic-curve cryptosystems. Many recent articles discuss this issue. Coron [62] suggests the use of random projective coordinates to avoid the costly (and power-consuming) field inversion operation needed for adding and doubling of points. Möller [206] proposes a non-conventional way of carrying out the double-and-add procedure. Izu and Takagi [138] describe a Montgomery-type point addition scheme resistant against side-channel attacks. An improved version of this algorithm, that works for a more general class of elliptic curves, is presented in Izu et al. [137].

Young and Yung introduce the concept of SETUP in [307]. The PAP SETUP on RSA and the ElGamal signature SETUP are from this paper which also includes attacks on DSA and Kerberos authentication protocol. In a later paper [308], Young and Yung categorizes SETUP in three types: regular, weak and strong. Strong SETUPs are proposed for Diffie–Hellman key exchange and for RSA. The third reference [309] from the same authors extends the ideas of kleptography further and provides backdoor routines for several other cryptographic schemes.

Crépeau and Slakmon [70] adopt a more informal approach and discuss several backdoors for RSA key generation. In addition to the trapdoors with hidden small private and public exponents, described in the text, they propose a trapdoor that hides small prime public exponent. They also present an improved version of the PAP routine. Unlike Young and Yung, they suggest symmetric techniques for designing fe, fd. Symmetric techniques endanger universal protection of the attacker, but continue to make perfect sense in the context of black-box cryptography.

8. Quantum Computation and Cryptography

8.1Introduction
8.2Quantum Computation
8.3Quantum Cryptography
8.4Quantum Cryptanalysis
 Chapter Summary
 Sugestions for Further Reading

Our best theories are not only truer than common sense, they make far more sense than common sense does.

—David Deutsch [76]

One can be a masterful practitioner of computer science without having the foggiest notion of what a transistor is, not to mention how it works.

—N. David Mermin [197]

But suppose I could buy a truly powerful quantum computer off the shelf today — what would I do with it? I don’t know, but it appears that I will have plenty of time to think about it!

—John Preskill [243]

8.1. Introduction

So far, we studied algorithms in the area of cryptology, that can be implemented on classical computers (Turing machines or von Neumann’s stored-program computers). Now, we shift our attention to a different paradigm of computation, known as quantum computation. The working of a quantum computer is specified by the laws of quantum mechanics, a branch of physics developed in the 20th century. However counterintuitive, contrived or artificial these laws initially sound, they have been accepted by the physics community as robust models of certain natural phenomena. A bit, modelled as a quantum mechanical system, appears to be a more powerful unit than a classical bit to build a computing device.

This enhanced power of a computing device has many important ramifications in cryptology. On one hand, we have polynomial-time quantum algorithms to solve the integer factorization and the discrete-log problems. This implies that most of the cryptographic algorithms that we discussed earlier become (provably) insecure. On the other hand, there are proposals for a quantum key-exchange method that possesses unconditional (and provable) security.

Unfortunately, it is not clear how one can manufacture a quantum computer. Technological difficulties involved in the process appear enormous and a section of the crowd even questions the feasibility of building such a machine. However, no laws or proofs rule out the possibility of success in the (near or distant) future. Myth has it that Thomas Alva Edison, after several hundred futile attempts to manufacture an electric light bulb, asserted that he knew hundreds of ways how one cannot make an electric bulb. Edison succeeded eventually and dream turned into reality.

But we will not build quantum computers in this chapter. That is well beyond the scope of this book, or, for that matter, of computer science in general. It is thoroughly unimportant to understand the I-V curves of a transistor (or even to know what a transistor actually is), when one designs and analyses (classical) algorithms. In order to design and analyse quantum algorithms, it is equally unimportant to know how a quantum computer can be realized.

8.2. Quantum Computation

We start with a formal description of quantum computation. Quantum mechanical laws govern this paradigm. We will pay little attention to the physical interpretations of these laws. A mathematical formulation suffices for our purpose.

For defining a quantum mechanical system, we need to enrich our mathematical vocabulary. Let V be a vector space over (or ). Using Dirac’s ket notation we denote a vector ψ in V as |ψ〉.

Definition 8.1.

An inner product (also called a dot product or a scalar product) on V is a function satisfying the following properties:

  1. Positivity For any , the inner product 〈ψ|ψ〉 is real and non-negative. Moreover, 〈ψ|ψ〉 = 0 if and only if |ψ〉 = 0.

  2. Linearity For a1, and |ψ〉, , we have .

  3. Skew symmetry For any |ψ〉, , we have , where the bar denotes complex conjugate.

A vector space V with an inner product is called an inner product space.

Example 8.1.

For , the space is an inner product space with the inner product of |ψ〉 = (ψ1, . . . , ψn) and defined as

Definition 8.2.

The inner product on a vector space V induces a norm (Definition 2.115) on V:

An inner product space which is complete (Definition 2.119) under the norm induced by its inner product is called a Hilbert space. We will typically consider finite-dimensional Hilbert spaces (over ) and for denote the n-dimensional Hilbert space by .

Definition 8.3.

We define an equivalence relation ~ on a Hilbert space as if and only if for some . An equivalence class under this relation is called a ray in . One typically considers a vector |ψ〉 with 〈ψ|ψ〉 = 1 as a representative of its equivalence class. Such a representative is unique up to multiplication by complex numbers of the form eiθ.

Definition 8.4.

An orthonormal basis of a Hilbert space is a subset B of with the following properties:

  1. B is a -basis of .

  2. 〈ψ|ψ〉 = 1 for every .

  3. for every pair of distinct vectors ψ, .

It is customary to denote the n vectors in an orthonormal basis of by the symbols |0〉, |1〉, . . . , |n – 1〉.

Example 8.2.

|0〉 := (1, 0, 0 . . . , 0), |1〉 := (0, 1, 0, . . . , 0), . . . , |n – 1〉 := (0, 0, . . . , 0, 1) form an orthonormal basis of under the inner product of Example (8.1).

8.2.1. System

The following axiom describes the model of a quantum mechanical system.

Axiom 8.1. First axiom of quantum mechanics

A system is a ray in a (finite-dimensional) Hilbert space (over ).

Definition 8.5.

The simplest non-trivial quantum mechanical system is a ray in a 2-dimensional Hilbert space . Such a system is assumed to be the basic building block of a quantum computer and is called a quantum bit or a qubit.

In order distinguish a qubit from a classical bit, we call the latter a cbit.

has an orthonormal basis {|0〉, |1〉}. In the classical interpretation, a cbit can assume only the two values |0〉 and |1〉, whereas a qubit can assume any value of the form

a|0〉 + b|1〉witha, , |a|2 + |b|2 = 1.

Such a state of the qubit is called a superposition of the classical states.

Though we don’t care much, at least for the moment, here are two promising candidates for realizing a qubit:

A conceptual example of a 2-state quantum system is the Schrödinger cat. The two independent states of a cat, as we classically know, are |alivei〉 and |deadi〉. However, if we think of the cat confined in a closed room and isolated from our observations, quantum mechanics models the state of the cat as a superposition (that is, a complex-linear combination) of these two states. But then if the quantum model were true, opening the room may reveal the cat in a non-trivial state a|alive〉 + b|dead〉 for some complex numbers a, b with |a|2 + |b|2 = 1. It would indeed be an exciting experience. But alas, quantum mechanics precludes the possibility of such an observation. Read on to know what we would actually see, if we open the room.

8.2.2. Entanglement

A single qubit is too small to build a useful computer. We need to use several (albeit a finite number of) qubits and hence must have a way to describe the combined system in terms of the individual qubits. As the simplest and basis case, we first concentrate on combining two quantum systems into one.

Axiom 8.2. Second axiom of quantum mechanics

Let A and B be two quantum mechanical systems with respective Hilbert spaces and . Let {|iA | i = 0, . . . , m – 1} and {|jB | j = 0, . . . , n – 1} be orthonormal bases of these Hilbert spaces. The quantum mechanical system AB having A and B as its two parts is described by the tensor product

where is an mn-dimensional Hilbert space with an orthonormal basis

{|iA ⊗ |jB | i = 0, . . . ,m – 1 and j = 0, . . . , n – 1}.

It is customary to abbreviate the normalized vector |iA ⊗ |jB as |iA|jB or even as |ijAB. A general state of AB is of the form

We can generalize this construction to describe a system having components A1, . . . , Ak. If is the Hilbert space of Ai with an orthonormal basis {|ji | 0 ≤ j < ni}, the composite system A1 · · · Ak has the n1 · · · nk-dimensional Hilbert space with an orthonormal basis comprising the vectors

|j11 ⊗ |j22 ⊗ · · · ⊗ |jkk = |j11|j22 · · · |jkk = |j1j2 . . . jk

with 0 ≤ ji < ni for all i = 1, . . . , k.

Definition 8.6.

An n-bit quantum register is a system having exactly n qubits.

Let A1, . . . , An denote the individual bits in an n-bit quantum register A. Each Ai has the Hilbert space with orthonormal basis {|0〉, |1〉}. So A has the 2n-dimensional Hilbert space with an orthonormal basis consisting of the vectors

|j1〉 ⊗ |j2〉 ⊗ · · · ⊗ |jn〉 = |j1〉|j2〉 · · · |jn〉 = |j1j2 · · · jn

with each . Viewed as an integer in binary notation, j1j2 . . . jn is an integral value between 0 and 2n – 1. This gives us a canonical numbering |0〉, |1〉, . . . , |2n – 1〉 of the basis vectors for the register A. These 2n values are precisely the states that a classical n-bit register can have. The quantum register can, however, be in any state |ψ〉 which is a superposition of the classical states:

Let us once again look at the general composite system A = A1 · · · Ak. In the classical sense, each state of A is composed of the individual states of the subsystems Ai. For example, each of the 2n classical states of an n-bit register corresponds to a choice between |0〉 and |1〉 for each individual bit. That is, each individual component retains its own state in a classical composite system. This is, however, not the case with a quantum composite system. Just think of a 2-bit quantum register C := AB. A state

|ψ〉C = c0|0〉C + c1|1〉C + c2|2〉C + c3|3〉C

of C equals a tensor product

1A ⊗ |ψ2B=(a0|0〉A + a1|1〉A) ⊗ (b0|0〉B + b1|1〉B)
 =a0b0|0〉C + a0b1|1〉C + a1b0|2〉C + a1b1|3〉C,

if and only if c0c3 = c1c2.

Definition 8.7.

The state |ψ〉 of a quantum register A = A1 · · · An is called entangled, if |ψ〉 cannot be written as a tensor product of the states of any two parts of A. In other others, |ψ〉 is entangled if and only if no set of fewer than n qubits of A possesses its individual state.

Entanglement essentially implies correlation or interaction between the components. In a composite quantum system, we cannot treat the components individually. A quantum system, as we have defined (axiomatically) earlier, is a completely isolated system. In reality, interactions with the surroundings make a (non-isolated) system change its state and get entangled. This is one of the biggest problems in the realization of a quantum computer. Quantum error correction is an important topic in quantum computation. For our purpose, we stick to the abstract model of an isolated system (quantum register) immune from external disturbances.

8.2.3. Evolution

Quantum registers give us a way to store quantum information. A computation involves manipulating the information stored in the registers. In quantum mechanics, all such operations must be reversible, that is, it must be possible to invert every operation. The only invertible operations on the classical states |0〉, |1〉, . . . , |2n – 1〉 of an n-bit quantum register A are precisely all the permutations of the classical states. Now that A can be in many more (quantum) states, there are other allowed operations on A. Any such operation must be reversible and of a particular type. This is the third axiom of quantum mechanics, which is detailed shortly.

A classical n-bit register supports many non-invertible operations. For example, erasing the content of the register (that is, resetting all the bits to zero) is a non-invertible process, since the pre-erasure state of the register cannot be uniquely determined after the erase operation is carried out. Classical computation is based on (classical) gates (like NOT, AND, OR, XOR, NOR, NAND), most of which are non-invertible. XOR, as an example, requires two input bits and outputs a single bit. It is impossible to determine the inputs uniquely from the output only. All such non-reversible operations are disallowed in the quantum world. An invertible version of the XOR operation takes two bits x and y as input and outputs the two bits x and xy (where ⊕ denotes XOR of bits). Given the output (x, xy), the input can be uniquely determined as (x, y) = (x, x ⊕ (xy)), that is, by applying the reversible XOR operation once more.

Like XOR, all bit operations that build up a classical computer can be realized using reversible operations only. This gives us the (informal) assurance that quantum computers are at least as powerful as classical computers.

Back to the business—the third axiom of quantum mechanics.

Definition 8.8.

Let U be a square matrix (that is, an m × m matrix for some ) with complex entries. The conjugate transpose of U is denoted by the symbol U, that is, if U = (uij), then . U is called unitary, if UU = UU = I, where I is the m × m identity matrix. Every unitary matrix U is invertible with U–1 = U, and preserves the inner product of , that is, for |ψ〉, .

Let A be a quantum system (like a quantum register) with Hilbert space . An m × m unitary matrix U defines a unitary linear transformation on taking a normalized vector |ψ〉 to a normalized vector U|ψ〉. Moreover, the transformation maps an orthonormal basis of to another orthonormal basis of (Exercise 8.4).

Axiom 8.3. Third axiom of quantum mechanics

A quantum system evolves unitarily, that is, any operation on a quantum mechanical system is a unitary transformation.

Example 8.3.

The Hadamard transform H on one qubit is defined as:

(Recall that a linear transformation is completely specified by its images of the elements of a basis.) If one takes and , the Hadamard transform corresponds to the unitary matrix

By linearity, H transforms a general state |ψ〉 = a|0〉 + b|1〉 to the state

Some other unitary operators are described in Exercises 8.5 and 8.6.

An important consequence of quantum mechanical dynamics is that cloning of a state of a system is not permissible. In other words, there does not exist an operator that copies an arbitrary state (content) of one quantum register to another.

Theorem 8.1. No-cloning theorem

For two n-bit registers A and B, there do not exist a unitary transform U of the composite system AB and a state of B, such that for every state of |ψ〉 of A.

Proof

Assume that such a state of B and a unitary transform U of AB exist. Take two states |ψ1〉 and |ψ2〉 of A. Then, and . By linearity, we have . Now, since U clones |aψ1 + bψ2〉 also, . The two expressions for are different, unless a = 0, b = 1 or a = 1, b = 0.

8.2.4. Measurement

We have seen how to represent a quantum mechanical system and do operations on the system. Now comes the final part of the game, namely observing or measuring or reading the state of a quantum system. In classical computation, reading the value stored in a classical register is a trivial exercise—just read it! In quantum mechanics, this is not the case.

Axiom 8.4. Fourth axiom of quantum mechanics—the Born rule

Let A be a quantum mechanical system with an orthonormal basis {|0〉, |1〉, . . . , |m – 1〉}. Assume that A is in a state . A measurement of A at this state is a mechanism (or device) that outputs one of the integers , and i is output with probability |ai|2. If i is output by the measurement, the system collapses from the state |ψ〉 to the state |i〉 after the measurement.

This means that whatever the state |ψ〉 of A was before the measurement, the process of measurement can reveal only one of m possible integer values. Moreover, the measurement causes a total loss of information about the pre-measurement amplitudes ai. Thus, it is impossible to measure A repeatedly at the state |ψ〉 to see a statistical pattern in the occurrences of different values of i so as to guess the probabilities |ai|2.

If we open the room, we can see the Schrödinger cat in only one of the two possible states: |alivei or |deadi. Well, then, what else can we expect? Quantum mechanics only models the cat in the isolated room as one evolving following the unitary dynamics.

At first glance, this is rather frustrating. We claim that the system went through a series of classically meaningless states, but the classical states are all we can see. What is the guarantee that the system really evolved in the quantum mechanical way? Well, there is no guarantee actually. The solace is that the axioms of quantum mechanics can explain certain natural phenomena. Also it is perfectly consistent with the classical behaviour in that if the system A evolves classically and is measured at the state |i〉 (so that ai = 1 and aj = 0 for ji), measuring A reveals i with probability one and causes the system to collapse to the state |i〉, that is, to remain in the state |i〉 itself.

There is a positive side of the quantum mechanical axioms. A quantum mechanical system is inherently parallel. An n-bit classical register at any point of time can hold only one of the classical values |0〉, . . . , |2n – 1〉. An n-bit quantum register, on the other hand, can simultaneously hold all these classical values, with respective probabilities. This inherent parallelism seems to impart a good deal of power to a computing device. Of course, as long as we cannot harness some physical objects to build a real quantum mechanical computing device, quantum computation continues to remain science fiction. But on an algorithmic level, the inherent parallelism of a (hypothetical) quantum computer can be exploited to do miracles, for example, to design a polynomial-time integer factorization algorithm. This is where we win—at least conceptually. Our failure to see a cat in the state (|alive〉 – |dead〉) should not bother us at all!

Measurement of a quantum register gives us a way to initialize a quantum register A to a state |ψ〉. Suppose that we get the value i upon measuring A. We then apply any unitary transform on A that changes A from the post-measurement state |i〉 to the desired state |ψ〉.

The measurement described in Axiom 8.4 is called measurement in the classical basis. The system A has, in general, many orthonormal bases other than the classical one {|0〉, . . . , |m – 1〉}. If B is any such basis, we can conceive of measuring A in the basis B. All we need to perform is to rewrite the state of A in terms of the new basis B. This can be achieved by applying to A a unitary transformation (the change-of-basis transformation) before the measurement in the classical basis is carried out.

A generalization of the Born rule is also worth mentioning here. Suppose that we have an m + n-bit quantum register A and we want to measure not all but some of the bits of A. To be more specific, let us say that we want to measure the leftmost m bits of A, though the generalized Born rule works for any arbitrary choice of m bit positions in the register A. Denoting by |im, i = 0, . . . , 2m – 1, the canonical basis vectors for the left m bits and by |jn, j = 0, . . . , 2n – 1, those for the right n bits, a general state of A can be written as

with Σi,j|ai,j|2 = 1 and with |i, jm+n identified as |im|jn = |im ⊗ |jn. A measurement of the left m bits of A yields an integer i, 0 ≤ i ≤ 2m – 1, with probability . Also this measurement causes A to collapse to the state .

Now, if we immediately apply the generalized Born rule once again on the right n bits of A, we get an integer j, 0 ≤ j ≤ 2n – 1, with probability |ai,j|2/pi and the system collapses to the state |in|jn. The probability of getting |in|jn by this two step process is then pi|ai,j|2/pi = |ai,j|2. This is consistent with a single application of the original Born rule.

8.2.5. The Deutsch Algorithm

We start with a general framework of doing computations using quantum registers. Suppose we want to compute a function f which requires an m-bit integer as input and which outputs an n-bit integer. A general function f need not be invertible, but we cannot afford non-invertible operations on quantum registers. This is why we work on an m + n-bit quantum register A in which the left m bits represent the input and the right n bits the output. Computing f(x) for a given x is tantamount to designing a unitary transformation Uf that acts on A and converts its state from |xm|yn to |xm|f(x) ⊕ yn, where ⊕ is the bitwise XOR operation, and where the subscripts (m and n) indicate the number of bits in the input or output part of A. It is easy to verify that Uf is unitary. Moreover, the inverse of Uf is Uf itself. For y = 0, we, in particular, have Uf (|xm|0〉n) = |xm|f(x)〉n.

It may still be unclear to the reader what one really gains by using this quantum model. The answer lies in the parallelism inherent in a quantum register. In order to see how this parallelism can be exploited, we describe David Deutsch’s algorithm which, being the first known quantum algorithm, has enough historical importance to be included here in spite of its apparent irrelevance in the context of cryptology.

Assume that f : {0, 1} → {0, 1} is a function that operates on one bit and outputs one bit. There are four such functions: Two of these are constant functions (f(0) = f(1)) and the remaining two non-constant (f(0) ≠ f(1)). We are given a black box Df representing f. We don’t know which one of the four functions Df actually implements, but we can supply a bit to Df as input and read its output on this bit. Our task is to determine whether Df represents a constant function or not. Classically, we make two invocations of Df on the inputs 0 and 1 and make a comparison of the output values f(0) and f(1). It is impossible to solve the problem classically using only one invocation of the black box. The Deutsch algorithm makes this task possible using quantum computational techniques.

Following the general quantum computational model we assume that Df is a unitary transformation on a 2-bit register A (with m = n = 1) that computes Df |x〉|y〉 = |x〉|f(x) ⊕ y〉 with the left (resp. the right) bit corresponding to the input (resp. the output) of f. Instead of supplying a classical input to Df we initialize the register A to the state

Linearity shows that on this input, Df ends its execution leaving A in the state

Here, . We won’t measure A right now, but apply the Hadamard transform on the left bit. This transforms A to the state

Now, if we measure the input bit, we deterministically get the integer 1 or 0 according as whether f is constant or not respectively. That’s it!

Deutsch’s algorithm solved a rather artificial problem, but it opened up the possibilities of exploring a new paradigm of computation. Till date, (good) quantum algorithms are known for many interesting computational problems. In the rest of this chapter, we concentrate on some of the quantum algorithms that have an impact in cryptology.

Exercise Set 8.2

8.1Let S be a finite set and let l2(S) denote the set of all functions .
  1. Show that l2(S) is a Hilbert space under the inner product

  2. Let , where δx(y) is 1 if y = x, and is 0 otherwise. Show that B is an orthonormal basis of l2(S).

8.2Show that the vectors and form an orthonormal -basis of .
8.3Show that is an entangled state of a 2-bit quantum register.
8.4Prove the following assertions.
  1. The matrix is unitary.

  2. A unitary matrix preserves inner product, that is, if U is an m × m unitary matrix and |ψ〉, , then .

  3. The determinant of a unitary matrix has absolute value 1.

  4. Every eigen value of a unitary matrix has absolute value 1.

  5. An m × m matrix A is unitary if and only if the columns of A constitute an orthonormal basis of (over ).

8.5
  1. Show that the following operators are unitary on a qubit. Also construct the corresponding transformation matrices.

    Identity operatorI|0〉 = |0〉, I|1〉 = |1〉.
    Exchange operatorX|0〉 = |1〉, X|1〉 = |0〉.
    Z operatorZ|0〉 = |0〉, Z|1〉 = –|1〉.
    Hadamard operator.

  2. Deduce the following identities:

  3. Let . Show that defines a unitary operator on a qubit and that , where the last X is the matrix of the exchange operator.

8.6Let A be an n-bit quantum register. Let us plan to number the bits of A as 1, . . . , n from left to right. One can apply the operators like X, Z, H of Exercise 8.5 on each individual bit of A. A qubit operation B applied on bit i of A will be denoted by Bi.
  1. Let Sij be the operator that swaps bit i with bit j. Show that

  2. Let C be the reversible XOR operation (also called the controlled-NOT operation) on a two-bit register A = (A1A2), that is, C|xy〉 = |x〉|xy〉. Show that C can be realized as

8.7Suppose that whenever you switch on your quantum computer, every bit in its registers is initialized to the state |0〉. Describe how you can use the operators I, X, Z and H defined in Exercise 8.5, in order to change the state of a qubit from |0〉 to the following:
  1. |1〉

  2. –|1〉

8.8Let A be an n-bit quantum register at the state |0|〉n. Show that the application of the Hadamard transform individually to each bit of A transforms A to the state . This is precisely the state of A in which all of the 2n possible outcomes in a measurement of A are equally likely. What happens if we apply H a second time individually to each bit of A, that is, what is H1H2 · · · Hn|ψ〉, where Hi denotes the Hadamard transform on the i-th bit of A?
8.9We know that any arithmetic or Boolean operation can be implemented using AND and NOT gates. This exercise suggests a reversible way to implement these operations. The Toffoli gate is a function T : {0, 1}3 → {0, 1}3 that maps (x, y, z) ↦ (x, y, zxy), where ⊕ means XOR, and xy means AND of x and y. Thus, T flips the third bit, if and only if the first two bits are both 1.
  1. Show that T is a unitary transformation on a 3-bit quantum register. What is the inverse of T?

  2. Use T to realize the Boolean AND and NOT operations.

8.3. Quantum Cryptography

We now describe the quantum key-exchange algorithm due to Bennett and Brassard. The original paper also talks about a practical implementation of the algorithm—an implementation using polarization of photons. For this moment, we do not highlight such specific implementation issues, but describe the algorithm in terms of the conceptual computational units called qubits.

The usual actors Alice and Bob want to agree upon a shared secret using communication over an insecure channel. A third party who gave her name as Carol plans to eavesdrop during the transmission. Alice and Bob repeat the following steps. Here, H stands for the Hadamard transform.

Algorithm 8.1. Quantum key-exchange algorithm

Alice generates a random classical bit .

Alice makes a random choice .

Alice computes the quantum bit A := Hx|i〉.

Alice sends A to Bob.

Bob makes a random choice .

Bob computes B := HyA.

Bob measures B to get the classical bit .

Bob sends y to Alice.

Alice sends x to Bob.

if (x = y) { Bob and Alice retains i = j }

The algorithm works as follows. Alice generates a random bit i and a random decision x whether she is going to use the Hadamard transform H. If x = 0, she sends the quantum bit |0〉 or |1〉 to Bob. If x = 1, she sends either or to Bob. At this point Bob does not know whether Alice applied H before the transmission. So Bob makes a random guess and accordingly skips/applies the Hadamard transform on the qubit received. If x = y = 0, then Bob has the qubit B = H0H0|i〉 = |i〉 and a measurement of this qubit reveals i with probability 1. On the other hand, if x = y = 1, then B = H2|i〉 = |i〉, since H2 is the identity transform (Exercise 8.5). In this case also, Bob retrieves Alice’s classical bit i with certainty by measuring B.

If xy, then B is generated from Alice’s initial choice |i〉 using a single application of H, that is, in this case. A measurement of this bit outputs 0 or 1, each with probability , that is, Bob gathers no idea about the initial choice of Alice. So after it is established that xy, they both discard the bit.

If we assume that x and y are uniformly chosen, Bob and Alice succeed in having x = y about half of the time. They eventually set up an n-bit secret after about 2n invocations of the above protocol. Table 8.1 illustrates a sample session between Alice and Bob. After 20 iterations of the above procedure, they agree upon the shared secret 0001110111.

Table 8.1. A sample session of the quantum key-exchange algorithm
IterationixAyBjCommon bit
10101 
200|0〉11 
3011|0〉00
40100 
51101 
600|0〉0|0〉00
700|0〉0|0〉00
810|1〉0|1〉11
900|0〉10 
101100 
110101 
1200|0〉10 
1310|1〉11 
14111|1〉11
15111|1〉11
16011|0〉00
17111|1〉11
1810|1〉0|1〉11
190100 
2010|1〉0|1〉11

What remains to explain is how this protocol guards against eavesdropping by Carol. Let us model Carol as a passive adversary who intercepts the qubit A transmitted by Alice, investigates the bit to learn about Alice’s secret i and subsequently transmits the qubit to Bob. In order to guess i, Carol mimics the role of Bob. At this point Carol does not know x, so she makes a guess z about x, accordingly skips/applies the Hadamard transform on the intercepted qubit in order to get a qubit C, measures C to get a bit value k and sends the measured qubit D to Bob. (Recall from Theorem 8.1 that it is impossible for Carol to make a copy of A, work on this copy and transmit the original qubit A to Bob.) Bob receives D, assumes that it is the qubit A transmitted by Alice and carries out his part of the work to generate the bit j. Bob and Alice later reveal x and y. If xy, they anyway reject the bits obtained from this iteration. Carol should also reject her bit k in this case. So let us concentrate only on the case that x = y. The introduction of Carol in the protocol changes A to D and hence Alice and Bob may eventually agree upon distinct bits. A sample session of the protocol in presence of Carol is illustrated in Table 8.2. The three parties generate the secret as:

Alice0110 0111 1000 1011
Bob0101 1101 1100 1011
Carol0100 0101 0100 1011

Table 8.2. Eavesdropping during a key-exchange session
IterationixAzC = HzAkDyB = HyDj
1011|0〉0|0〉10
210|1〉0|1〉1|1〉0|1〉1
310|1〉10|0〉0|0〉0
40100|0〉11
5011|0〉0|0〉11
6111|1〉1|1〉11
71100|0〉10
810|1〉0|1〉1|1〉0|1〉1
91100|0〉11
100101|1〉11
1100|0〉10|0〉0|0〉0
1200|0〉0|0〉0|0〉0|0〉0
13111|1〉1|1〉11
1400|0〉0|0〉0|0〉0|0〉0
1510|1〉0|1〉1|1〉0|1〉1
1610|1〉11|1〉0|1〉1

In this example, Alice and Bob’s shared secrets differ in five bit positions. Carol’s intervention causes a shared bit to differ with a probability of (Exercise 8.11). Thus, the more Carol eavesdrops, the more she introduces different bits in the secret shared by Alice and Bob.

Once Alice and Bob generate a shared secret of the desired bit length, they can check for the equality of their secret values without revealing them. For example, if the shared secret is a 64-bit DES key, Alice can send Bob one or more plaintext–ciphertext pairs generated by the DES algorithm using her shared key. Bob also generates the ciphertexts on Alice’s plaintexts using his secret key. If the ciphertexts generated by Bob differ from those generated by Alice, Bob becomes confident that their shared secrets are different and this happened because of the presence of some adversary (or because of communication errors). They then repeat the key-exchange protocol.

Another possible way in which Alice and Bob can gain confidence about the equality of their shared secrets is the use of parity checks. Suppose Alice breaks up her secret in blocks of eight bits and for each block computes the parity bit and sends these bits to Bob. Bob generates the parity bits on the blocks of his secret and compares the two sets of parity bits. If the shared secrets of Alice and Bob differ, it is revealed by this parity check with high probability.

A minor variant of the key-exchange algorithm just described comes with an implementation strategy. The polarization of a photon is measured by an angle θ, 0° ≤ θ < 180°.[1] A photon polarized at an angle θ passes through a φ-filter with the probability cos2(φ – θ) and gets absorbed in the filter with the probability sin2(φ – θ). Therefore, a photon polarized at the angles 0°, 90°, 45°, 135° can be used to represent the quantum states |0〉, |1〉, , respectively. Alice and Bob use 0°- and 45°-filters. Alice makes a random choice (x) among the two filters. If x = 0, she sends a photon polarized at an angle 0° or 90°. If x = 1, a photon polarized at an angle 45° or 135° is sent. When Bob receives the photon transmitted by Alice, he makes a random guess y. If y = 0, he uses the 0°-filter to detect its polarization, and if y = 1, he uses the 45°-filter to detect its polarization. Then, Alice and Bob reveal their choices x and y and if the two choices agree, they share a common secret bit. See Exercise 8.12 for a mathematical formulation of this strategy.

[1] Ask a physicist!

One of the most startling features of this Bennett–Brassard algorithm (often called the BB84 algorithm) is that there has been successful experimental implementations of the strategy. The first prototype was designed by the authors themselves in the T. J. Watson Research Center. They used a quantum channel of length 32 cm. Using longer channels requires many technological barriers to be overcome. For example, fiber optic cables tend to weaken and may even destroy the polarization of photons. Using boosters to strengthen the signal is impossible in the quantum mechanical world, since doing so produces an effect similar to eavesdropping. Interference pattern (instead of polarization) has been proposed and utilized to build longer quantum channels for key exchange. At present, Stucki et al. [293] hold the world record of performing quantum key exchange over an (underwater) channel of length 67 km between Geneva and Lausanne.

Exercise Set 8.3

8.10We have exploited the property that H2 = I in order to prove the correctness of the quantum key-exchange algorithm. Exercise 8.5 lists some other operators (X and Z) which also satisfy the same property (X2 = Z2 = I). Can one use one of these transforms in place of H in the quantum key-exchange algorithm?
8.11Assume that Carol eavesdrops (in the manner described in the text) during the execution of the quantum key-exchange protocol between Alice and Bob. Derive for different choices of i, x and z the following probabilities Pixz of having ij in the case x = y.
ixzPixz
0000
0011/2
0101/2
0111/2
1000
1011/2
1101/2
1111/2

If all these choices of i, x, z are equally likely, show that the probability that Carol introduces mismatch (that is, ij) in a shared bit during a random execution of the key-exchange protocol with x = y is 3/8.

(Note that if x = y = z = 0, that is, if the execution of the algorithm proceeds entirely in the classical sense, Carol goes unnoticed. It is the application of the classically meaningless Hadamard transform, that introduces the desired security in the protocol.)

8.12In the key-exchange algorithm described in the text, Bob (and also Carol) always measure qubits in the classical basis {|0〉, |1〉}. Now, consider the following variant of this algorithm. Alice sends, as before, one of the four qubits |0〉, |1〉, , depending on her choice of i and x. Bob upon receiving the qubit A generates a random guess . If y = 0, Bob measures A in the classical basis, whereas if y = 1, Bob measures A in the basis {H|0〉, H|1〉}. After this, they exchange x and y, and retain/discard the bits as in the original algorithm.
  1. Assume that there is no eavesdropping. Argue that this modified strategy works, that is, if x = y, we have i = j, whereas if xy, then i = j with probability .

  2. Explain the role of a passive adversary (Carol) in this modified strategy.

  3. Calculate for this variant the probability that Carol introduces an error in a shared bit (when x = y).

8.4. Quantum Cryptanalysis

The quantum parallelism has been effectively exploited to design fast (polynomial-time) algorithms to solve some of the intractable mathematical problems discussed in Chapter 4. With the availability of quantum computers, cryptographic systems that derive their security from the intractability of these problems will be unusable (completely insecure). Nobody, however, has the proof that these intractable problems cannot have fast classical algorithms. It is interesting to wait and see which (if any) is invented first, a quantum computer or a polynomial-time classical algorithm.

Let us set up some terminology for the rest of this chapter. Let P be a unitary operator on a qubit. One can apply P individually on the i-th bit of an n-bit register. In this case, we denote the operation by Pi. If Pi is operated for each i = 1, . . . , n (in succession or simultaneously), then we abbreviate P1 · · · Pn by the short-hand notation P(n). The parentheses distinguish the operation from Pn which is the n-fold application of P on a single qubit.

If P and Q are unitary transforms on n1- and n2-bit quantum registers respectively, we let PQ denote the unitary transform on an n1 + n2-bit register, with P operating on the left n1 bits and Q on the right n2 bits of the register.

8.4.1. Shor’s Algorithm for Computing Period

Let N := 2n for some . Let be a periodic function with (least) period r, that is, f(x + kr) = f(x) for every x, . Suppose further that 1 ≪ r ≤ 2n/2 and also that f(0), f(1), . . . , f(r – 1) are pairwise distinct. Shor proposed an algorithm for an efficient computation of the period r in this case.

Let’s first look at the problem classically. If one evaluates f at randomly chosen points, by the birthday paradox (Exercise 2.172) one requires evaluations of f on an average in order to find two different integers x and y with f(x) = f(y). But then r|(xy). If sufficiently many such pairs (x, y) are available, the period can be obtained by computing the gcd of the integers xy. If r is large, say, r = O(2n/2), this gives us an algorithm for computing r in expected time exponential in n. Shor’s quantum algorithm determines r in expected time polynomial in n.

Let us assume that we have an oracle Uf which, on input the 2n-bit value |xn|yn, computes |xn|f(x) ⊕ yn. We prepare a 2n-bit register A in the state |0〉n|0〉n. Then, we apply the Hadamard transform H(n) on the left n-bits. By Exercise 8.8, the state of A becomes

Supplying this state as the input to the oracle Uf yields the state

We then measure the output register (right n bits). By the generalized Born rule, we get a value for some and the state of the register A collapses to the uniform superposition of all those |x〉|f(x)〉 for which f(x) = f(x0). By the given periodicity properties of f, the post-measurement state of the input register (left n bits) can be written as

Equation 8.1


for some M determined by the relations:

x0 + (M – 1)r < Nx0 +Mr.

This is an interesting state, for if we were allowed to make copies of this state and measure the different copies, we could collect some values x0+j1r, . . . , x0+jkr, which in turn would reveal r with high probability. But the no-cloning theorem disallows making copies of quantum states. Shor proposed a trick to work around with this difficulty. He considered the following transform:

Equation 8.2


By Exercise 8.13, F is a unitary transform. F is known as the Fourier transform. Applying F to State (8.1) transforms the input register to the state

A measurement of this state gives an integer with the probability

Application of the Fourier transform to State (8.1) helps us to concentrate the probabilities of measurement outcomes in strategic states. More precisely, consider a value of y, where –1/2 ≤ ∊k < 1/2, that is, a value of y close to an integral multiple of N/r. In this case,

The last summation is that of a geometric series and we have

Now, we use the inequalities for 0 ≤ x ≤ π/2 and the facts that rMN and that to get

Since has about r positive integral multiples less than N and each such multiple has a closest integer yk for some k, the probability that we obtain one such yk as the outcome of the measurement is at least 4/π2 = 0.40528 . . . , that is, after O(1) iterations of the above procedure we get some yk. The Fourier transform increases the likelihood of getting some yk to a level bounded below by a positive constant.

What remains is to show that r can be retrieved from such a useful observation yk. We have . If a/b and c/d are two distinct rationals with b, and with and , then by the triangle inequality we have . On the other hand, since a/bc/d, we have , a contradiction. Therefore, since , there is a unique rational k/r satisfying , and this rational k/r can be determined by efficient classical algorithms, for example, using the continued fraction expansion[2] of yk/N.

[2] Consult Zuckerman et al. [316] to learn about continued fractions and their applications in approximating real numbers.

If gcd(k, r) = 1, we get r. We can verify this by checking if f(x) = f(x + r). If gcd(k, r) > 1, we get a factor of r. Repeating the entire procedure gives another k′/r, from which we get (hopefully) another factor of r (if not r itself). After a few (O(1)) iterations, we obtain r as the lcm of its factors obtained.

Much of the quantum magic is obtained by the use of the Fourier transform F on a suitably prepared quantum register. The question is then how easy it is to implement F. We will not go to the details, but only mention that a circuit consisting of basic quantum gates and of size O(n2) can be used to realize the Fourier transform (cf. Exercise 8.14).

To sum up, we have a polynomial-time (in n) randomized quantum algorithm for computing the period r of f. This leads to efficient quantum algorithms for solving many classically intractable problems of cryptographic significance.

8.4.2. Breaking RSA

Let m = pq with p, . We have φ(m) = (p – 1)(q – 1). Choose an RSA key pair (e, d) with gcd(e, φ(m)) = 1 and ed ≡ 1 (mod φ(m)). Given a message the ciphertext message is bae (mod m). The task of a cryptanalyst is to compute a from the knowledge of m, e and b. If gcd(b, m) > 1, then this gcd is a non-trivial factor of m. So assume that . But then also. Since bae (mod m), b is in the subgroup of generated by a. Similarly, abd (mod m), that is, a is in the subgroup of generated by b. It follows that these two subgroups are equal and, in particular, the multiplicative orders of a and b modulo m are the same. This order—call it r—divides φ(m) and hence is ≤ (p – 1)(q – 1) < m.

Choose with N := 2nm2 > r2. The function sending xbx (mod m) is periodic of (least) period r. By Shor’s algorithm, one computes r efficiently. Since gcd(e, φ(m)) = 1 and r|φ(m), we have gcd(e, r) = 1, that is, using the extended gcd algorithm one obtains an integer d′ with de ≡ 1 (mod r). But then bdadea (mod m).

The private key d is the inverse of e modulo φ(m). It is not necessary to compute d for decrypting b. The inverse d′ of e modulo r = ordm(a) = ordm(b) suffices.

8.4.3. Factoring Integers

Let m be a composite integer that we want to factor. Choose a non-zero integer . If gcd(a, m) > 1, then we already know a non-trivial factor of m. So assume that gcd(a, m) = 1, that is, . Let r := ordm(a).

As in the case of breaking RSA, choose with N := 2nm2 > r2. The function , xax (mod m), is periodic of least period r. Shor’s algorithm computes r. If r is even, we can write:

(ar/2 – 1)(ar/2 + 1) ≡ 0 (mod m).

Since ordm(a) = r, ar/2 – 1 ≢ 0 (mod m). If we also have ar/2 + 1 ≢ 0 (mod m), then gcd(ar/2 + 1, m) is a non-trivial factor of m. It can be shown that the probability of finding an even r with ar/2 + 1 ≢ 0 (mod m) is at least half (cf. Exercise 4.9). Thus, trying a few integers one can factor m.

8.4.4. Computing Discrete Logarithms

A variant of Shor’s algorithm in Section 8.4.1 can be used to compute discrete logarithms in the finite field , , . For the sake of simplicity, let us concentrate only on prime fields (s = 1). Let g be a generator of and our task is to compute for a given an integer with agr (mod p). We assume that p is a large prime, that is, p is odd.

Choose with N := 2n satisfying p < N < 2p. We use a 3n-bit quantum register A in which the left 2n bits constitute the input part and the right n bits the output part. The input part is initialized to the uniform superposition of all pairs , that is, A has the initial state:

(see Exercise 8.15). Then, we use an oracle

Uf : |xn|yn|zn ↦ |xn|yn|f(x, y) ⊕ zn

to compute the function f(x, y) := gxay (mod p) in the output register. Applying Uf transforms A to the state

Measurement of the output register now gives a value zgk (mod p) for some and causes the input register to jump to the state

Note that gxaygk (mod p) if and only if xryk (mod p – 1), that is, only those pairs (x, y) that satisfy this congruence contribute to the post-measurement state. For each value of y modulo p – 1, we get a unique xry + k (mod p – 1), that is, there are exactly p – 1 such pairs (x, y).

If we were allowed make copies of this state and observe two copies separately, we would get pairs (x1, y1) and (x2, y2) with x1ry1x2ry2k (mod p – 1). Now, if gcd(y1y2, p – 1) = 1, we would get r ≡ (y1y2)–1 (x1x2) (mod p – 1). But we are not allowed to copy quantum states. So Shor used his old trick, that is, applied the Fourier transforms

to obtain the state

A measurement of the input register at this state yields with probability:

Equation 8.3


As in Shor’s period-finding algorithm, we now require to identify a set of useful pairs (u, v) which are sufficiently many in number so as to make the probability of observing one of them bounded below by a positive constant. We also need to demonstrate how a useful pair can reveal the unknown discrete logarithm r of a. The jugglery with inequalities and approximations is much more involved in this case. Let us still make a patient attempt to see the end of the story.

First, we eliminate one of x, y from Equation (8.3). Since xry + k (mod p – 1) and 0 ≤ xp – 2, we have x = (ry + k) rem . But then . Let j be the integer closest to u(p – 1)/N, that is, u(p – 1) = jN + ∊ with , –N/2 < ∊ ≤ N/2. This yields

Equation 8.4


where

Equation 8.5


Since is an integer, substituting Equation (8.4) in Equation (8.3) gives

Writing S = lN + σ with –N/2 < σ ≤ N/2 then gives

We now impose the usefulness conditions on u, v:

Equation 8.6


Equation 8.7


Involved calculations show that the probability pu,v for a (u, v) satisfying these two conditions is at least . Let us now see how many pairs (u, v) satisfy the conditions. From Equation (8.5), it follows that for each u there exists a unique v, such that Condition (8.6) is satisfied. Condition (8.7), on the other hand, involves only u. If w := v2(p – 1), then 2w must divide ∊. For each multiple of 2w not exceeding N/12 in absolute value, we get 2w distinct solutions for u modulo N. (We are solving for u the congruence u(p – 1) ≡ ∊ (mod 2n).) There is a total of at least N/12 of them. Therefore, the probability of making any one of the useful observations (u, v) is at least , since N < 2p.

We finally explain the extraction of r from a useful observation (u, v). Condition (8.6) and Equation (8.5) give . Dividing throughout by N and using the fact that u(p – 1) = jN + ∊, we get

that is, the fractional part of must lie between and . The measurement of the input gives us v and we know N. We approximate to the nearest multiple of and get rj ≡ λ (mod p – 1). Now, j, being the integer closest to u(p – 1)/N, is also known to us. If gcd(j, p – 1) = 1, we have rj–1λ (mod p – 1). We don’t go into the details of determining the likelihood of the invertibility of j modulo p – 1. A careful analysis shows that Shor’s quantum discrete-log algorithm runs in probabilistic polynomial time (in n).

Exercise Set 8.4

8.13Let F be the Fourier Transform (8.2). For basis vectors |x〉 and |x′〉, show that

Conclude that F is a unitary transform.

8.14Let N = 2n. Let x, have binary expansions (xn–1 · · · x1x0)2 and (yn–1 · · · y1y0)2 respectively.
  1. Show that xy/N equals an integer plus the quantity

    yn–1 (.x0) + yn–2(.x1x0) + yn–3(.x2x1x0) + · · · + y0(.xn–1 xn–2 . . . x0),

    where .

  2. Deduce that the quantum Fourier Transform (8.2) can be written as

    where the i-th expression in parentheses applies to the i-th bit from the left.

8.15Let , N := 2n and . Consider an (n + 1)-bit quantum register with input consisting of the left n bits and the output the rightmost bit. Suppose there is an oracle Uf that takes an n-bit input x and outputs the bit:

First prepare the register in the state . Then, apply Uf on this register and finally measure the output bit. Describe the state of the input register after this measurement depending on the outcome of the measurement.

8.16Recall that the Fourier Transform (8.2) is defined for N equal to a power of 2. It turns out that for such values of N the quantum Fourier transform is easy to implement. For this exercise, assume hypothetically that one can efficiently implement F for other values of N too. In particular, take N = p – 1 in Shor’s quantum discrete-log algorithm. Show that in this case, the probability pu,v of Equation (8.3) becomes:

Conclude that an outcome (u, v) of measuring the input register yields

r ≡ –u–1v (mod p – 1),

provided gcd(u, p – 1) = 1.

Chapter Summary

This chapter is a gentle introduction to the recent applications of quantum computation in public-key cryptography. These developments have both good and bad impacts for cryptologers. It is still a big question whether a quantum computer can ever be manufactured. So at present a study of quantum cryptology is mostly theoretical in nature.

Quantum mechanics is governed by a set of four axioms that define a system and prescribe the properties of a system. A quantum bit (qubit) is a quantum mechanical system that has two orthogonal states |0〉 and |1〉. A quantum register is a collection of qubits of a fixed size.

As an example of what we can gain by using quantum algorithms, we first describe the Deutsch algorithm that determines whether a function f : {0, 1} → {0, 1} is constant by invoking f only once. A classical algorithm requires two invocations.

Next we present the BB84 algorithm for key exchange over a quantum mechanical channel. The algorithm guarantees perfect security. This algorithm has been implemented in hardware, and key agreement is carried out over a channel of length 67 km.

Finally, we describe Shor’s polynomial-time quantum algorithms for factoring integers and for computing discrete logarithms in finite fields. These algorithms are based on a technique called quantum Fourier transform.

If quantum computers can ever be realized, RSA and most other popular cryptosystems described and not described in this book will forfeit all security guarantees. And what will happen to this book? If you don’t possess a copy of this wonderful book, just rush to your nearest book store now—they have not yet mastered the quantum technology!

Suggestions for Further Reading

There was a time when the newspapers said that only twelve men understood the theory of relativity. I do not believe there ever was such a time . . . On the other hand, I think I can safely say that nobody understands quantum mechanics.

—Richard Feynman, The Character of Physical Law, BBC, 1965

Quantum mechanics came into existence, when Werner Heisenberg, at the age of 25, proposed the uncertainty principle in 1927. It created an immediate stir in the physics community. Eventually Heisenberg and Niels Bohr came up with an interpretation of quantum mechanics, known as the Copenhagen interpretation. While many physicists (like Max Born, Wolfgang Pauli and John von Neumann) subscribed to this interpretation, many other eminent ones (including Albert Einstein, Erwin Schrödinger, Max Planck and Bertrand Russell) did not. Interested readers may consult textbooks by Sakurai [255] and Schiff [258] to study this fascinating area of fundamental science.[3]

[3] Well! We are not physicists. These books are followed in graduate and advanced undergraduate courses in many institutes and universities.

For a comprehensive treatment of quantum computation (including cryptographic and cryptanalytic quantum algorithms), we refer the reader to the book by Nielsen and Chuang [218]. Mermin’s paper [197] and course notes [198] are also good sources for learning quantum mechanics and computation, and are suitable for computer scientists. Preskill’s course notes [244] are also useful, though a bit more physics-oriented. The very readable article [243] by Preskill on the realizability of quantum computers is also worth mentioning in this context. The first known quantum algorithm is due to Deutsch [75].

Bennett and Brassard’s quantum key-exchange algorithm (BB84) appeared in [20]. The implementation due to Stucki et al. of this algorithm is reported in [293].

Shor’s polynomial-time quantum factorization and discrete-log algorithms are described in [271]. All the details missing in Section 8.4.4 can be found in this paper. No polynomial-time quantum algorithms are known to solve the elliptic curve discrete logarithm problem. Proos and Zalka [245] present an extension of Shor’s algorithm for a special class of elliptic curves. See [146] for an adaptation of this algorithm applicable to fields of characteristic 2.

Appendices

 


A. Symmetric Techniques

A.1Introduction
A.2Block Ciphers
A.3Stream Ciphers
A.4Hash Functions

Sour, sweet, bitter, pungent, all must be tasted.

—Chinese Proverb

Unless we change direction, we are likely to end up where we are going.

—Anonymous

Not everything that can be counted counts, and not everything that counts can be counted.

—Albert Einstein

A.1. Introduction

Cryptography, today, cannot bank solely on public-key (that is, asymmetric) algorithms. Secret-key (that is, symmetric) techniques also have important roles to play. This chapter is an attempt to introduce to the readers some rudimentary notions about symmetric cryptography. The sketchy account that follows lacks both the depth and the breadth of a comprehensive treatment. Given the focus of this book, Appendix A could have been omitted. Nonetheless, some attention to the symmetric technology is never irrelevant for any book on cryptology.

It remains debatable whether hash functions can be treated under the banner of this chapter—a hash function need not even use a key. If the reader is willing to accept symmetric as an abbreviation for not asymmetric, some justifications can perhaps be given. How does it matter anyway?

A.2. Block Ciphers

Block ciphers encrypt plaintext messages in blocks of fixed lengths and are more ubiquitously used than public-key encryption routines. In a sense, public-key encryption is also block encryption. Since public-key routines are much slower than (secret-key) block ciphers, it is a custom to use public-key algorithms only in specific situations, for example, for encrypting single blocks of data, like keys of symmetric ciphers.

In the rest of this chapter, we use the word bit in the conventional sense, that is, to denote a quantity that can take only two possible values, 0 and 1. It is convenient to use the symbol to refer to the set {0, 1}. We also let stand for the set of all bit strings of length m. Whenever we plan to refer to the field (or group) structure of , we will use the alternative notation .

Definition A.1.

A block cipher f of block-size n and of key-size r is a map

that encrypts a plaintext block m of bit length n to a ciphertext block c of bit length n under a key K, a bit string of length r. To ensure unique decryption, the map

for a fixed key K has to be a permutation of (that is, a bijective function on) . In that case, the decryption of c to get back m is carried out as .

A good block cipher has the following desirable properties:

A block cipher provably possessing all these good characteristics (in particular, the randomness properties) is difficult to construct in practice. Practical block ciphers are manufactured for reasonably big n and r and come with the hope of representing reasonably unpredictable permutations. We dub a block cipher good or safe, if it stands the test of time. Table A.1 lists some widely used block ciphers.

Table A.1. Some popular block ciphers
Namenr
DES (Data Encryption Standard)6456
FEAL (Fast Data Encipherment Algorithm)6464
SAFER (Secure And Fast Encryption Routine)6464
IDEA (International Data Encryption Algorithm)64128
Blowfish64≤ 448
Rijndael, accepted as AES (Advanced Encryption Standard) by NIST (National Institute of Standards and Technology, a US government organization)128/192/256128/192/256

A.2.1. A Case Study: DES

The data encryption standard (DES) was proposed as a federal information processing standard (FIPS) in 1975. DES has been the most popular and the most widely used among all block ciphers ever designed. Although its relatively small key-size offers questionable security under today’s computing power, DES still enjoys large-scale deployment in not-so-serious cryptographic applications.

DES encryption requires a 64-bit plaintext block m and a 56-bit key K.[1] Let us plan to use the notations DESK and to stand respectively for DES encryption and decryption functions under the key K.

[1] A DES key K = k1k2 . . . k64 is actually a 64-bit string. Only 56 bits of K are used for encryption. The remaining 8 bits are used as parity-check bits. Specifically, for each i = 1, . . . , 8 the bit k8i is adjusted so that the i-th byte (k8i – 7k8i – 6 . . . k8i) has an odd number of one-bits.

DES key schedule

The DES algorithm first computes sixteen 48-bit keys K1, K2, . . . , K16 from K using a procedure known as the DES key schedule described in Algorithm A.1. These 16 keys are used in the 16 rounds of encryption. The key schedule uses two fixed permutations PC1 and PC2 described after Algorithm A.1 and to be read in the row-major order. Here, PC is an abbreviation for permuted choice.

Algorithm A.1. The DES key schedule

Input: A DES key K = k1k2 . . . k64 (containing the parity-check bits).

Output: Sixteen 48-bit round keys K1, K2, . . . , K16.

Steps:

Use PC1 to generate .
Write U0 = C0 ‖ D0 with C0.
for i = 1, 2, . . . ,16 {
   Take 
   Cyclically left shift Ci – 1 by s bits to get Ci.
   Cyclically left shift Di – 1 by s bits to get Di.
   Let .
   Compute the i-th round key Ki := PC2(Ui) = u14u17u11 . . . u29u32.
}

PC1
5749413325179
1585042342618
1025951433527
1911360524436
63554739312315
7625446383022
1466153453729
211352820124

PC2
1417112415
3281562110
2319124268
1672720132
415231374755
304051453348
444939563453
464250362932

DES encryption

DES encryption, as described in Algorithm A.2, proceeds in 16 rounds. The i-th round uses the key Ki (obtained from the key schedule) in tandem with the encryption primitive e. A fixed permutation IP and its inverse IP–1 are also used.[2]

[2] A block cipher that executes several encryption rounds with the i-th round computing the two halves as Li := Ri – 1 and Ri := Li – 1e(Ri – 1, Ki) for some round key Ki and for some encryption primitive e, is called a Feistel cipher. Most popular block ciphers mentioned earlier are of this type. Rijndael is an exception, and its acceptance as the new standard has been interpreted as an end of the Feistel dynasty.

It requires a specification of the round encryption function e to complete the description of DES encryption. The function e can be compactly depicted as:

e(X, J) := P(S(E(X) ⊕ J)),

Algorithm A.2. DES encryption

Input: Plaintext block m = m1m2 . . . m64 and the round keys K1, . . . , K16.

Output: The ciphertext block .

Steps:

Apply the initial permutation on m to get
     
Write V = L0 ‖ R0 with L0.
for i = 1, 2, . . . , 16 {
   /* The i-th encryption round */
   Li := Ri – 1.
   Ri := Li – 1 ⊕ e(Ri – 1Ki).
}
Let .
Apply the inverse of the initial permutation on W to get the ciphertext block
   .

IP
585042342618102
605244362820124
625446383022146
645648403224168
57494133251791
595143352719113
615345372921135
635547393123157

IP–1
408481656246432
397471555236331
386461454226230
375451353216129
364441252206028
353431151195927
342421050185826
33141949175725

where is an expansion function, is a contraction function and P is a fixed permutation of (called the permutation function). S uses eight S-boxes (substitution boxes) S1, S2, . . . , S8. Each S-box Sj is a 4 × 16 matrix with each row a permutation of 0, 1, 2, . . . , 15 and is used to convert a 6-bit string y1y2y3y4y5y6 to a 4-bit string z1z2z3z4 as follows. Let μ denote the integer with binary representation y1y6 and ν the integer with binary representation y2y3y4y5. Then, z1z2z3z4 is the 4-bit binary representation of the μ, ν-th entry in the matrix Sj. (Here, the numbering of the rows and columns starts from 0.) In this case, we write Sj(y1y2y3y4y5y6) = z1z2z3z4. Algorithm A.3 provides the description of e.

Algorithm A.3. The DES round encryption primitive e

Input: and .

Output: e(X, J).

Steps:

Y := E(X) ⊕ J (where E(x1x2 . . . x32) = x32x1x2 . . . x32x1).
Write Y = Y1 ‖ Y2 ‖ . . . ‖ Y8 with each .
for 
.
 (where P(z1z2 . . . z32) = z16z7z20 . . . z4z25).

The tables for E and P are as follows.

E
3212345
456789
8910111213
121314151617
161718192021
202122232425
242526272829
28293031321

P
1672021
29122817
1152326
5183110
282414
322739
1913306
2211425

Finally, the eight S-boxes are presented:

S1
1441312151183106125907
0157414213110612119538
4114813621115129731050
1512824917511314100613

S2
1518146113497213120510
3134715281412011069115
0147111041315812693215
1381013154211671205149

S3
1009146315511312711428
1370934610285141211151
1364981530111212510147
1101306987415143115212

S4
7131430691012851112415
1381156150347212110149
1069012117131513145284
3150610113894511127214

S5
2124171011685315130149
1411212471315015103986
4211110137815912563014
1181271142136150910453

S6
1211015926801334147511
1015427129561131401138
9141552812370410113116
4321295151011141760813

S7
4112141508133129751061
1301174911014351221586
1411131237141015680592
6111381410795015142312

S8
1328461511110931450127
1151381037412561101492
7114191214206101315358
2114741081315129035611

DES decryption

DES decryption is analogous to DES encryption. To obtain one first computes the round keys K1, K2, . . . , K16 using Algorithm A.1. One then calls a minor variant of Algorithm A.2. First, the roles of m and c are interchanged. That is, one inputs c instead of m, and obtains m in place of c as output. Moreover, the right half Ri in the i-th round is computed as Ri := Li – 1e(Ri – 1, K17 – i). In other words, DES decryption is same as DES encryption, only with the sequence of using the keys K1, K2, . . . , K16 reversed. Solve Exercise A.1 in order to establish the correctness of this decryption procedure.

DES test vectors

Some test vectors for DES are given in Table A.2.

Table A.2. DES test vectors
KeyPlaintext blockCiphertext block
010101010101010100000000000000008ca64de9c1b123a7
fefefefefefefefeffffffffffffffff7359b2163e4edc58
31010101010101011000000000000001958e6e627a05557B
10101010101010101111111111111111f40379ab9e0ec533
0123456789abcdef111111111111111117668dfc7292532d
10101010101010100123456789abcdef8a5ae1f81ab8f2dd
fedcba98765432100123456789abcdefed39d950fa74bcc4

Cryptanalysis of DES

DES, being a popular block cipher, has gone through a good amount of cryptanalytic studies. At present, linear cryptanalysis and differential cryptanalysis are the most sophisticated attacks on DES. But the biggest problem with DES is its relatively small key size (56 bits). An exhaustive key search for a given plaintext–ciphertext pair needs carrying out a maximum of 256 encryptions in order to obtain the correct key. But how big is this number 256 = 72,057,594,037,927,936 (nearly 72 quadrillion) in a cryptographic sense?

In order to review this question, RSA Security Inc. posed several challenges for obtaining the DES key from given plaintext–ciphertext pairs. The first challenge, posed in January 1997, was broken by Rocke Verser of Loveland, Colorado, with approximately 96 days of computing. DES Challenge II-1 was broken in February 1998 by distributed.net with 41 days of computing, and the DES challenge II-2 was cracked in July 1998 by the Electronic Frontier Foundation (EFF) in just 56 hours. Finally, DES Challenge III was broken in a record of 22 hours 15 minutes in January 1999. The computations were carried out in EFF’s supercomputer Deep Crack with collaborative efforts from nearly 105 PCs on the Internet guided by distributed.net. These figures demonstrate that DES offers hardly any security against a motivated adversary.

Another problem with DES is that its design criteria (most importantly, the objectives behind choosing the particular S-boxes) were never made public. Chances remain that there are hidden backdoors, though none has been discovered till date.

A.2.2. The Advanced Standard: AES

The advanced encryption standard (AES) [219] has superseded the older standard DES. The Rijndael cipher designed by Daemen and Rijmen has been accepted as the advanced standard. As mentioned in Footnote 2, Rijndael is not a Feistel cipher. Its working is based on the arithmetic in the finite field and in the finite ring .

Data representation

AES encrypts data in blocks of 128 bits. Let B = b0b1 . . . b127 be a block of data, where each bi is a bit. Keeping in view typical 32-bit processors, each such block B is represented as a sequence of four 32-bit words, that is, B = B0B1B2B3, where Bi represents the bit string b32ib32i+1 . . . b32i+31. Each word C = c0c1 . . . c31, in turn, is viewed as a sequence of four octets, that is, C = C0C1C2C3, where Ci stores the bit string c8ic8i+1 . . . c8i+7. Each octet is identified as an element of , whereas an entire 32-bit word is identified with an element of .

The field is represented as , where f(X) is the irreducible polynomial X8 + X4 + X3 + X + 1. Let x := X + 〈f(X)〉. The element is identified with the octet d7d6 . . . d1d0. Thus, the i-th octet c8ic8i+1 . . . c8i+7 in a word is treated as the finite field element .

Now, let us explain the interpretation of a 32-bit word C = C0C1C2C3. The -algebra is not a field, since the polynomial Y4 + 1 is reducible (over and so over ). However, each element β of A can be uniquely expressed as a polynomial β = α3y3 + α2y2 + α1y + α0, where y := Y + 〈Y4 + 1〉 and where each αi is an element of . As described in the last paragraph, each αi is represented as an octet. We take Ci to be the octet representing α3 – i, that is, the 32-bit word α3α2α1α0 stands for the element .

and A are rings and hence equipped with arithmetic operations (addition and multiplication). These operations are different from the usual addition and multiplication operations defined on octets and words. For example, the addition of two octets or words under the AES interpretation is the same as bit-wise XOR of octets or words. The AES multiplication of octets and words, on the other hand, involves polynomial arithmetic and reduction modulo the defining polynomials and so cannot be expressed so simply as addition. To resolve ambiguities, let us plan to denote the multiplication of by ⊙ and that of A by ⊗, whereas regular multiplication symbols (·, × and juxtaposition) stand for the standard multiplication on octets or words. Exercises A.5, A.6 and A.7 discuss about efficient implementations of the arithmetic in and A.

Every non-zero element is invertible; the inverse is denoted by α–1 and can be computed by the extended gcd algorithm on polynomials over . With an abuse of notation, we take 0–1 := 0. Every non-zero element of A is not invertible (under the multiplication of A). The AES algorithm uses the following invertible element β := 03010102 (in hex notation); its inverse is β–1 = 0b0d090e.

The AES algorithm uses an object called a state, comprising 16 octets arranged in a 4 × 4 array. Each message block also consists of 16 octets. Let M = μ0μ1 . . . μ15 be a message block (of 16 octets). This block is translated to a state as follows:

Equation A.1


Thus, each word in the block is relocated in a column of the state. At the end of the encryption procedure, AES makes the reverse translation of a state to a block:

Equation A.2


AES key schedule

A collection of round keys is generated from the given AES key K. The number of rounds of the AES encryption algorithm depends on the size of the key. Let us denote the number of words in the AES key by Nk and the corresponding number of rounds by Nr. We have:

One first generates an initial 128-bit key K0K1K2K3. Subsequently, for the i-th round, 1 ≤ iNr, a 128-bit key K4iK4i+1K4i+2K4i+3 is required. Here, each Kj is a 32-bit word. The key schedule (also called key expansion) generates a total of 4(Nr + 1) words K0, K1, . . . , K4Nr+3 from the given secret key K using a procedure described in Algorithm A.4. Here, (02)j – 1 stands for the octet that represents the element . The following table summarizes these values for j = 1, 2, . . . , 15.

j123456789101112131415
xj – 101020408102040801b366cd8ab4d9a

The transformation SubWord on a word T = τ0τ1τ2τ3 is the octet-wise application of AES S-box substitution SubOctet, that is,

SubWord(T) = SubOctet(τ0) ‖ SubOctet(τ1) ‖ SubOctet(τ2) ‖ SubOctet(τ3).

Algorithm A.4. AES key schedule

Input: (Nk and) the secret key K = κ0κ1 ... κ4Nk – 1, where each κi is an octet.

Output: The expanded keys K0, K1, . . . , K4Nr+3.

Steps:

/* Initially copy the bytes of K */
for i = 0, 1, . . . , Nk – 1 { Ki := κ4iκ4i+1κ4i+2κ4i+3. }

/* Recursively define the round keys */
for i = NkNk + 1, . . . , 4Nr + 3 {
      T := Ki – 1;       /* T is a temporary word variable. */
      /* Let T = τ0τ1τ2τ3where each τi is an octet. */
      if (i rem Nk = 0) { T := SubWord(τ1τ2τ3τ0) ⊕ [(02)(i/Nk) – 1‖000000]. }
      else if (Nk > 6) and (i rem Nk = 4) { T := SubWord(T). }
      Ki := KiNk ⊕ T.
}

The transformation SubOctet is also used in each encryption round and is now described. Let A = a0a1 . . . a7 be an octet that can be identified with an element of as mentioned earlier. Let B = b0b1 . . . b7 denote the octet representing the inverse of this finite field element. (We take 0–1 = 0.) One then applies the following affine transformation on B to generate the final value C := SubOctet(A) := c0c1 . . . c7. Here, D = d0d1 . . . d7 is the constant octet 63 = 01100011.

Equation A.3


In order to speed up this octet substitution, one may use table lookup. Since the output octet C depends only on the input octet A, one can precompute a table of values of SubOctet(A) for the 256 possible values of A. This list is given in Table A.3. The table is to be read in the row-major fashion. In other words, if hi and lo respectively represent the most and the least significant four bits of A, then SubOctet(A) can be read off from the entry in the table having row number hi and column number lo. For example, SubOctet(a7) = 5c. In an actual implementation, a one-dimensional array is to be used. We use a two-dimensional format in Table A.3 for the sake of clarity of presentation.

Table A.3. AES S-box
 0123456789abcdef
0637c777bf26b6fc53001672bfed7ab76
1ca82c97dfa5947f0add4a2af9ca472c0
2b7fd9326363ff7cc34a5e5f171d83115
304c723c31896059a071280e2eb27b275
409832c1a1b6e5aa0523bd6b329e32f84
553d100ed20fcb15b6acbbe394a4c58cf
6d0efaafb434d338545f9027f503c9fa8
751a3408f929d38f5bcb6da2110fff3d2
8cd0c13ec5f974417c4a77e3d645d1973
960814fdc222a908846eeb814de5e0bdb
ae0323a0a4906245cc2d3ac629195e479
be7c8376d8dd54ea96c56f4ea657aae08
cba78252e1ca6b4c6e8dd741f4bbd8b8a
d703eb5664803f60e613557b986c11d9e
ee1f8981169d98e949b1e87e9ce5528df
f8ca1890dbfe6426841992d0fb054bb16

AES encryption

AES encryption is described in Algorithm A.5. The algorithm first converts the input plaintext message block to a state, applies a series of transformations on this state and finally converts the state back to a message (the ciphertext).

The individual state transition transformations are now explained. The transition SubState is an octet-by-octet application of the substitution function SubOctet, that is, SubState maps

where for all r, c. The transform ShiftRows cyclically left rotates the r-th row by r byte positions, that is, maps

The AddKey operation uses four 32-bit round keys L0, L1, L2, L3. Name the octets of Li as λi0λi1λi2λi3. The i-th key Li is XORed with the i-th column of the state, that is, AddKey transforms

Finally, the MixCols transform multiplies each column of the state, regarded as an element of , by the element , where the coefficients (expressions within square brackets) are octet values in hexadecimal, that can be identified with elements of . For the c-th column, this transformation can be represented as:

Algorithm A.5. AES encryption

Input: The plaintext message M = μ0μ1 . . . μ15 and the round keys K0, K1, . . . , K4Nr+3.

Output: Ciphertext message C = γ0γ1 . . . γ15.

Steps:

Convert M to the state S.                                      /* Use Transform (A.1) */
S := AddKey(SK0K1K2K3).
for i = 1, 2, . . . , Nr {
      S := SubState(S).
      S := ShiftRows(S).
      if (i ≠ Nr) { S := MixCols(S). }
      S := AddKey(SK4iK4i+1K4i+2K4i+3).
}
Convert S to the message C.                                /* Use Transform (A.2) */

AES decryption

AES decryption involves taking inverse of each state transition performed during encryption. The key schedule needed for encryption is to be used during decryption too. The straightforward decryption routine is given in Algorithm A.6.

Algorithm A.6. AES decryption

Input: The ciphertext message C = γ0γ1 . . . γ15 and the round keys K0, K1, . . . , K4Nr+3.

Output: The recovered plaintext message M = μ0μ1 . . . μ15.

Steps:

Convert C to the state S.                                      /* Use Transform (A.1) */
S := AddKey(SK4NrK4Nr+1K4Nr+2K4Nr+3).
for i = Nr – 1, Nr – 2, . . . , 1, 0 {
      S := ShiftRows–1(S).
      S := SubState–1(S).
      S := AddKey(SK4iK4i+1K4i+2K4i+3).
      if (i ≠ 0) { S := MixCols–1(S). }
}
Convert S to the message M.                                /* Use Transform (A.2) */

What remains is a description of the inverses of the basic state transformations. AddKey involves octet-by-octet XORing and so is its own inverse. Table A.4 summarizes the inverse of the substitution transition SubOctet (Exercise A.8). For computing SubState–1(S), one should apply SubOctet–1 on each octet of S. The inverse of ShiftRows is also straightforward and can be given by

Finally, MixCols–1 involves multiplication of each column by the inverse of the element , that is, by the element [0b]y3 + [0d]y2 + [09]y + [0e]. So MixCols–1 transforms each column of the state as follows:

Table A.4. Inverse of AES S-box
 0123456789abcdef
052096ad53036a538bf40a39e81f3d7fb
17ce339829b2fff87348e4344c4dee9cb
2547b9432a6c2233dee4c950b42fac34e
3082ea16628d924b2765ba2496d8bd125
472f8f66486689816d4a45ccc5d65b692
56c704850fdedb9da5e154657a78d9d84
690d8ab008cbcd30af7e45805b8b34506
7d02c1e8fca3f0f02c1afbd0301138a6b
83a9111414f67dcea97f2cfcef0b4e673
996ac7422e7ad3585e2f937e81c75df6e
a47f11a711d29c5896fb7620eaa18be1b
bfc563e4bc6d279209adbc0fe78cd5af4
c1fdda8338807c731b11210592780ec5f
d60517fa919b54a0d2de57a9f93c99cef
ea0e03b4dae2af5b0c8ebbb3c83539961
f172b047eba77d626e169146355210c7d

AES decryption is as efficient as AES encryption, since each state transformation primitive has the same structure as its inverse. However, the sequence of application of these primitives in the loop (rounds) for decryption differs from that for encryption. For some implementations, mostly in hardware, this may be a problem. Compare this with DES for which the encryption and decryption algorithms are identical save the sequence of using the round keys (Exercise A.1). With little additional effort AES can also be furnished with this useful property of DES. All we have to do is to use a different key schedule for decryption. The necessary modifications are explored in Exercise A.9.

AES test vectors

Table A.5 provides the ciphertexts for the plaintext block

M = 00112233445566778899aabbccddeeff

under different keys.

Table A.5. AES test vectors
CipherKeyCiphertext block
AES-1280001020304050607 \ 08090a0b0c0d0e0f69c4e0d86a7b0430 \ d8cdb78070b4c55a
AES-1920001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617dda97ca4864cdfe0 \ 6eaf70a0ec0d7191
AES-2560001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 \ 18191a1b1c1d1e1f8ea2b7ca516745bf \ eafc49904b496089

Cryptanalysis of AES

AES has been designed so that linear and differential attacks are infeasible. Another attack known as the square attack has been proposed by Lucks [184] and Ferguson et al. [93], but at present can tackle less number of rounds than used in Rijndael encryption. Also see Gilbert and Minier [112] to know about the collision attack.

The distinct algebraic structure of AES encryption invites special algebraic attacks. One such potential attack (the XSL attack) has been proposed by Courtois and Pieprzyk [68]. Although this attack has not yet been proved to be effective, a better understanding of the algebra may, in foreseeable future, lead to disturbing consequences for the advanced standard.

For more information on AES, read the book [71] from the designers of the cipher. Also visit the following Internet sites:

http://www.esat.kuleuven.ac.be/~rijmen/rijndael/Rijndael home
http://csrc.nist.gov/CryptoToolkit/aes/index1.htmlNIST site for AES
http://www.cryptosystem.net/aes/Algebraic attacks

A.2.3. Multiple Encryption

Multiple encryption presents a way to achieve a desired level of security by using block ciphers of small key sizes. The idea is to cascade several stages of encryption and/or decryption, with different stages working under different keys. Figure A.1 illustrates double and triple encryption for a block cipher f. Each gi or hj represents either the encryption or the decryption function of f under the given key.

Figure A.1. Multiple encryption


For double encryption, we have K1K2 and both g1 and g2 are usually the encryption function. Unless fK2 ο fK1 is the same as fK for some key K and if the permutations of f are reasonably random, it appears at the first glance that double encryption increases the effective key size by a factor of two. Unfortunately, this is not the case. The meet-in-the-middle attack on double encryption works as follows.

Suppose that an adversary knows a plaintext–ciphertext pair (m, c) under the unknown keys K1, K2. We assume as before that f has block-size n and key-size r. The adversary computes for each possibility of the encrypted message xi := fi(m). She also computes for each the decrypted message . Now, (i, j) is a possible value of (K1, K2) if and only if .

A given pair (m, c) usually gives many such candidates (i, j) for (K1, K2). More precisely, if each is assumed to be a random permutation of , for a given i we have the equality for an expected number of 2r/2n values of j. Considering all possibilities for i gives an expected number of 2r × 2r/2n = 22rn candidate pairs (i, j). If f = DES, this number is 22 × 56–64 = 248.

If a second pair (m′, c′) under (K1, K2) is also known to the adversary, then for a given i the pair (i, j) is consistent with both (m, c) and (m′, c′) with probability 2r/(2n × 2n). Thus, we get an expected number of (2r × 2r)/(2n × 2n) = 22r – 2n candidates (i, j). For DES, this number is 2–16. This implies that it is very unlikely that a false candidate (i, j) satisfies both (m, c) and (m′, c′). Thus, with high probability the adversary uniquely identifies the double DES key (K1, K2) from two plaintext–ciphertext pairs.

This attack calls for O(2r) encryptions and O(2r) decryptions. With the assumption that each encryption takes roughly the same time as each decryption (as in the case of DES), the adversary spends a time for O(2r) encryptions. Moreover, she can find all the matches in O(r2r) time. This implies that double encryption increases the effective key size (over single encryption) by a few bits only. On the other hand, both the actual key size and the encryption time get doubled. In view of these shortcomings, double encryption is rarely used in practice.

For the triple encryption scheme of Figure A.1, a meet-in-the-middle attack at x or y demands an effort equivalent to O(22r) encryptions, that is, the effective key size gets doubled. It is, therefore, customary to take K1 = K3 and K2 different from this common value. The actual key size also gets doubled with this choice—one doesn’t have to remember K3 separately. It is also a common practice to take h1 and h3 the encryption function (under K1 = K3) and h2 the decryption function (under K2). One often calls this particular triple encryption an E-D-E scheme.

A.2.4. Modes of Operation

In practice, the length of the message m to be encrypted need not equal the block length n of the block cipher f. One then has to break up m into blocks of some fixed length n′ ≤ n and encrypt each block using the block cipher. In order to make the length of m an integral multiple of n′, one may have to pad extra bits to m (say, zero bits at the end). It is often necessary to store the initial size of m in a separate block, say, after the last message block. In what follows, we shall assume that the input message m gives rise to l blocks m1, m2, . . . , ml each of size n′. The corresponding ciphertext blocks c1, c2, . . . , cl will also be of bit length n′ each. The reason for choosing the block size n′ ≤ n will be clear soon.

The ECB mode

The easiest way to encrypt multiple blocks m1, . . . , ml is to take n′ = n and encrypt each block mi as ci := fK(mi). Decryption is analogous: . This mode of operation of a block cipher is called the electronic code-book or the ECB mode. Algorithms A.7 and A.8 describe this mode.

Algorithm A.7. ECB encryption

Input: The plaintext blocks m1, . . . , ml and the key K.

Output: The ciphertext c = c1 . . . cl.

Steps:

for i = 1, . . . , l { ci := fK(mi) }

Algorithm A.8. ECB decryption

Input: The ciphertext blocks c1, . . . , cl and the key K.

Output: The plaintext m = m1 . . . ml.

Steps:

for

In this mode, identical message blocks encrypt to identical ciphertext blocks (under the same key), that is, partial information about the plaintext may be leaked out. The following three modes overcome this problem.

The CBC mode

In the cipher-block chaining or the CBC mode, one takes n′ = n and each plaintext block is first XOR-ed with the previous ciphertext block and then encrypted. In order to XOR the first plaintext block, one needs an n-bit initialization vector (IV). The IV need not be kept secret and may be sent along with the ciphertext blocks.

Algorithm A.9. CBC encryption

Input: The plaintext blocks m1, . . . , ml, the key K and the IV.

Output: The ciphertext c = c1 . . . cl.

Steps:

c0 := IV.

for i = 1, . . . , l { ci := fK(mici – 1). }

Algorithm A.10. CBC decryption

Input: The ciphertext blocks c1, . . . , cl, the key K and the IV.

Output: The plaintext m = m1 . . . ml.

Steps:

c0 := IV.

for

The CFB mode

In the cipher feedback or the CFB mode, one chooses . In this mode, the plaintext blocks are not encrypted, but masked by XOR-ing with a stream of random keys generated from a (not necessarily secret) n-bit IV. In this sense, the CFB mode works like a stream cipher (see Section A.3).

Algorithm A.11. CFB encryption

Input: The plaintext blocks m1, . . . , ml, the key K and the IV.

Output: The ciphertext c = c1 . . . cl.

Steps:

k0 := IV.   /* Initialize the key stream */
for i = 1, . . . , l {
   /* Mask the current key by block encryption and the message by XOR-ing */
   ci := mi ⊕ msbn′ (fK(ki – 1)).
   /* Generate the next key from the previous key and the current ciphertext block */
   ki := lsbnn (ki – 1) ‖ ci.
}

Algorithm A.11 explains CFB encryption. The notation msbk(z) (resp. lsbk(z)) stands for the most (resp. least) significant k bits of a bit string z. For CFB decryption (Algorithm A.12), the identical key stream k0, k1, . . . , kl is generated and used to mask off the message blocks from the ciphertext blocks.

Algorithm A.12. CFB decryption

Input: The ciphertext blocks c1, . . . , cl, the key K and the IV.

Output: The plaintext m = m1 . . . ml.

Steps:

k0 := IV.
for i = 1, . . . , l {
   mi := ci ⊕ msbn (fK(ki – 1)).
   ki := lsbnn (ki – 1) ‖ ci.
}

The OFB mode

The output feedback or the OFB mode also works like a stream cipher by masking the plaintext blocks using a stream of keys. The key stream in the OFB mode is generated by successively applying the block encryption function on an n-bit (not necessarily secret) IV. Here, one chooses any .

OFB encryption is explained in Algorithm A.13. OFB decryption (Algorithm A.14) is identical, with only the roles of m and c interchanged, and requires the generation of the same key stream k0, k1, . . . , kl used during encryption.

Algorithm A.13. OFB encryption

Input: The plaintext blocks m1, . . . , ml, the key K and the IV.

Output: The ciphertext c = c1 . . . cl.

Steps:

k0 := IV.      /* Initialize the key stream */
for i = 1, . . . , l {
    ki := fK(ki–1).     /* Generate the next key in the stream */
    ci := mi ⊕ msbn (ki)    . /* Mask the plaintext block */
}

Algorithm A.14. OFB decryption

Input: The ciphertext blocks c1, . . . , cl, the key K and the IV.

Output: The plaintext m = m1 . . . ml.

Steps:

k0 := IV.     /* Initialize the key stream */
for i = 1, . . . , l {
   ki := fK(ki–1).    /* Generate the next key in the stream */
   mi := ci ⊕ msbn (ki).    /* Remove the mask from the ciphertext block */
}

Exercise Set A.2

A.1Let us use the notations of Algorithm A.2. For a message m and round keys Ki, we have the values V, Li, Ri, W, c. For another message m′ and another set of round keys , let us denote these values by V′, , , W′, c′. Show that if m′ = c and if for i = 1, . . . , 16, then and for all i = 0, 1, . . . , 16. Deduce that in this case we have c′ = m. (This shows that DES decryption is the same as DES encryption with the key schedule reversed.)
A.2For a bit string z, let denote the bit-wise complement of z. Deduce that , that is, complementing both the plaintext message and the key complements the ciphertext message. [H]
A.3A DES key K is said to be weak, if the DES key schedule on K gives K1 = K2 = · · · = K16. Show that there are exactly four weak DES keys which in hexadecimal notation are:
0101 0101 0101 0101
FEFE FEFE FEFE FEFE
1F1F 1F1F 0E0E 0E0E
E0E0 E0E0 F1F1 F1F1

A.4A DES key K is said to be anti-palindromic, if the DES key schedule on K gives for all i = 1, . . . , 16. Show that the following four DES keys (in hexadecimal notation) are anti-palindromic:
01FE 01FE 01FE 01FE
FE01 FE01 FE01 FE01
1FE0 1FE0 0EF1 0EF1
E01F E01F F10E F10E

A.5Represent , where f(X) = X8 + X4 + X3 + X + 1 (Section A.2.2).
  1. Show that multiplication by x (the octet 02) in can be computed by a left shift followed conditionally (derive the condition) by XORing with the octet 1b.

  2. Design an algorithm for multiplying two elements of using bit manipulations on octets only.

A.6The multiplication of can be made table-driven. Since this field contains 256 elements, a 256 × 256 array suffices to store all the products. That requires a storage of 64 kb. We can considerably reduce the storage by using discrete logs.
  1. Show that the multiplicative order of x (in ) is 51.

  2. Show that x + 1 is a generator of .

  3. Write a computer program to generate the table of discrete logarithms of elements of to the base x + 1 (Table A.6).

    Table A.6. Discrete-log table for AES
     0123456789abcdef
    000190132021ac64bc71b6833eedf03
    16404e00e348d81ef4c7108c8f8691cc1
    27dc21db5f9b9276a4de4a6729ac90978
    3652f8a05210fe12412f082453593da8e
    4968fdbbd36d0ce94135cd2f140468338
    566ddfd30bf068b62b325e29822889110
    67e6e48c3a3b61e423a6b2854fa853dba
    72b790a159b9f5eca4ed4ace5f373a757
    8af58a850f4ead6744faee9d5e7e6ade8
    92cd7757aeb160bf559cb5fb09ca951a0
    a7f0cf66f17c449ecd8431f2da4767bb7
    bccbb3e5afb60b1863b52a16caa55299d
    c97b2879061bedcfcbc95cfcd373f5bd1
    d5339843c41a26d47142a9e5d56f2d3ab
    e441192d923202e89b47cb8267799e3a5
    f674aeddec531fe180d638c80c0f77007

  4. Write a computer program to generate the table of powers of x + 1 (Table A.7).

    Table A.7. Power table for AES
     0123456789abcdef
    00103050f113355ff1a2e7296a1f81335
    15fe13848d87395a4f702060a1e2266aa
    2e5345ce43759eb266abed97090abe631
    353f5040c143c44cc4fd168b8d36eb2cd
    44cd467a9e03b4dd762a6f10818287888
    5839eb9d06bbddc7f8198b3ce49db769a
    6b5c457f9103050f00b1d2769bbd661a3
    7fe192b7d8792adec2f7193aee92060a0
    8fb163a4ed26db7c25de73256fa153f41
    9c35ee23d47c940c05bed2c749cbfda75
    a9fbad564acef2a7e829dbcdf7a8e8980
    b9bb6c158e82365afea256fb1c843c554
    cfc1f2163a5f407091b2d7799b0cb46ca
    d45cf4ade798b8691a8e33e42c651f30e
    e12365aee297b8d8c8f8a8594a7f20d17
    f394bdd7c8497a2fd1c246cb4c752f601

  5. Design an algorithm for multiplying two elements of using table lookup.

A.7Denote the multiplication of by ⊗ (Section A.2.2).
  1. Let α = a3y3 + a2y2 + a1y + a0 and β = b3y3 + b2y2 + b1y + b0 be elements of A and γ = c3y3 + c2y2 + c1y + c0 = α ⊗ β. Show that

    where the matrix arithmetic on the right side follows the arithmetic of .

  2. Verify that the inverse of the element of A represented by the word 03010102 (in hex) is 0b0d090e.

A.8
  1. Show that Transform (A.3) can be represented as

    where the matrix arithmetic on the right side is that of .

  2. Let denote the 8 × 8 matrix of Part (a). Prove that is invertible over with

  3. Conclude that the transformation A ↦ SubOctet(A) is invertible.

A.9
  1. Argue that the transforms SubState and ShiftRows commute with one another.

  2. Show that MixCols–1(AddKey(S, L0, L1, L2, L3)) = AddKey(MixCols–1(S), MixCols–1(L0, L1, L2, L3)) for a suitable meaning of the application of MixCols–1 on four 32-bit keys L0, L1, L2 and L3.

  3. Conclude that one can obtain a decryption key schedule in such a way that Algorithm A.15 correctly performs AES decryption. [H]

Algorithm A.15. Equivalent form of AES decryption

Input: The ciphertext message C = γ0γ1 . . . γ15 and the decryption key schedule .

Output: Plaintext message M = μ0μ1 . . . μ15.

Steps:

Convert C to the state S.                                 /* Use Transform (A.1) */

for i = Nr – 1, Nr – 2, . . . , 0 {
      S := SubState–1(S).
      S := ShiftRows–1(S).
      if (i ≠ 0) { S := MixCols–1(S). }
      
}
Convert S to the message M.                            /* Use Transform (A.2) */

A.10Show that a multiple encryption scheme with exactly k stages provides an effective security of ⌈k/2⌉ keys against the meet-in-the-middle attack.
A.11Consider a message m broken into blocks m1, . . . , ml, encrypted to c1, . . . , cl and sent to an entity.
  1. Suppose that during the transmission exactly one ciphertext block gets corrupted. Show that for the different modes of encryption, the numbers ν of blocks that are incorrectly decrypted due to this transmission error are as listed in the following table.

    Modeν
    ECB1
    CBC≤ 2
    CFB≤ 1 + ⌈n/n′⌉
    OFB1

  2. For each of the four modes, discuss the effects on decryption caused by the insertion or deletion of a ciphertext block during transmission (say, by an active adversary).

A.3. Stream Ciphers

A block cipher encrypts large blocks of data using a fixed key. A stream cipher, on the other hand, encrypts small blocks of data (typically bits or bytes) using different keys. The security of a stream cipher stems from the unpredictability of guessing the keys in the key stream. Here, we deal with stream ciphers that encrypt bit-by-bit.

Definition A.2.

A stream cipher F encrypts a plaintext m = m1m2 . . . ml to a ciphertext c = c1c2 . . . cl using a key stream k = k1k2 . . . kl, where each mi, ci, . F uses a function that yields f(mi, ki) = ci. In order to effect unique decryption, the map , μ ↦ f(μ, k), must be a bijection for each . F encrypts and decrypts bit-by-bit using the formulas ci = fki(mi) and .

Example A.1.

An obvious choice for fκ is fκ(μ) := μ ⊕ κ, so that . Suppose that the bits k1, k2, . . . , kl in the key stream are generated randomly and uniformly, independent of the plaintext bits. Let us assume that for an the probability Pr(mi = 0) is p, so that Pr(mi = 1) = 1 – p. Since Pr(ki = 0) = Pr(ki = 1) = 1/2, and mi and ki are independent, we have:

Pr(ci = 0)=Pr(mi = 0, ki = 0) + Pr(mi = 1, ki = 1)
 =Pr(mi = 0) Pr(ki = 0) + Pr(mi = 1) Pr(ki = 1)
 =p × (1/2) + (1 – p) × (1/2) = 1/2.

So Pr(ci = 1) is 1/2 too, that is, the two values of ci are equally likely, irrespective of the probability p. This, in turn, implies that the ciphertext bit ci provides absolutely no information about the plaintext bit mi. In this sense, this stream cipher, called Vernam’s one-time pad, offers unconditional security.

Generating a truly random key stream of arbitrary length is a difficult problem. Moreover, the same key stream is used for decryption and has to be reproduced at the recipient’s end. In view of these difficulties, Vernam’s one-time pad is used only very rarely.

A practical solution is to use a pseudorandom key stream k1, k2, k3, . . . generated from a secret key J of fixed small length. The bits in the pseudorandom stream should be sufficiently unpredictable and the length of J adequately large, so as to preclude the possibility of mounting a successful attack in feasible time.

Depending on how the key stream is generated from J, stream ciphers can be broadly classified in two categories. In a synchronous stream cipher, each key in the key stream is generated independent of any plaintext or ciphertext bit, whereas in a self-synchronizing (or asynchronous) stream cipher each key in the stream is generated based only on J and a fixed number of previous ciphertext bits. Algorithms A.16 and A.17 explain the workings of these two classes of stream ciphers.

Algorithm A.16. Encryption in a synchronous stream cipher

Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state S of the key stream generator.

Output: The ciphertext c = c1c2 . . . cl.

Steps:

s0 := S.                             /* Initialize the state of the key stream generator */
for i = 1, . . . , l {
   ki := g(si–1J).               /* Generate the key ki */
   si := δ(si–1J).                /* Transition to the next state */
   ci := fki (mi).                  /* Encrypt the plaintext bit mi */
}

Algorithm A.17. Encryption in an asynchronous stream cipher

Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state (ct+1, ct+2, . . . , c0).

Output: The ciphertext c = c1c2 . . . cl.

Steps:

for i = 1, . . . , l {
   ki := g(ci–tcit+1, . . . , ci–1J).         /* Generate the key ki */
   ci := fki (mi).                                     /* Encrypt the plaintext bit mi */
}

A block cipher in the OFB mode works like a synchronous stream cipher, whereas a block cipher in the CFB mode like an asynchronous stream cipher.

A.3.1. Linear Feedback Shift Registers

Linear feedback shift registers (LFSRs), being suitable for hardware implementation and possessing good cryptographic properties, are widely used as basic building blocks for many stream ciphers. Figure A.2 depicts an LFSR L with d stages or delay elements D0, D1, . . . , Dd–1, each capable of storing one bit. The state of the LFSR is described by the d-tuple s := (s0, s1, . . . , sd–1), where si is the bit stored in Di. It is often convenient to treat s as the column vector (s0 s1 . . . sd–1)t.

Figure A.2. A linear feedback shift register (LFSR) with d stages


There are d control bits a0, a1, . . . , ad–1. The working of the LFSR is governed by a clock. At every clock pulse the bits stored in the delay elements are bit-wise AND-ed with the respective control bits and the AND gate outputs are XOR-ed to obtain the bit sd. The bit s0 stored in D0 is delivered to the output. Finally, for each the delay element Di sets its stored bit to si+1, that is, the register experiences a right shift by one bit with the feedback bit sd filling up the leftmost delay element.

Thus, a clock pulse changes the state of the LFSR from s := (s0, s1, . . . , sd–1) to t := (t0, t1, . . . , td–1), where s and t are related as:

If s and t are treated as column vectors, this can be compactly represented as

Equation A.4


where the transition matrix ΔL is given by

Equation A.5


When the LFSR L is initialized to a non-zero state, the bit stream output by it can be used as a pseudorandom bit sequence. For a given set of control bits a0, . . . , ad–1, the next state of L is uniquely determined by its previous state only. Since L has only finitely many (2d – 1) non-zero states, the output bit sequence of L must be (eventually) periodic. For cryptographic use, the period of the bit sequence should be as large as possible. If the period is maximum possible, namely 2d – 1, L is called a maximum-length LFSR.

Many properties of the LFSR L can be explained in terms of its connection polynomial defined as:

Equation A.6


For example, assume that a0 = 1, so that deg CL(X) = d. Assume further that CL(X) is irreducible (over ). Consider the extension of , represented as , where . It turns out that if x is a generator of the cyclic group , then L is a maximum-length LFSR. In this case, the polynomial CL(X) is called a primitive polynomial of .[3]

[3] A primitive polynomial defined in this way has nothing to do with a primitive polynomial over a UFD, defined in Exercise 2.54. Mathematicians often go for such multiple definitions of the same terms and phrases.

A.3.2. Stream Ciphers Based on LFSRs

The bit sequence output by an LFSR L can be used as the key stream k1k2 . . . kl in order encrypt a plaintext stream m1m2 . . . ml to the ciphertext stream c1c2 . . . cl with ci := miki. The number d of stages in L should be chosen reasonably large and the control bits a0, . . . , ad–1 should be kept secret. The initial state of L may or may not be a secret. For suitable choices of a0, . . . , ad–1, the output sequences from L possess good statistical properties and hence L appears to be an efficient key stream generator.

Unfortunately, such a key stream generator is vulnerable to a known-plaintext attack as follows. Suppose that mi and ci are known for i = 1, 2, . . . , 2d. One can easily compute ki = mici for all these i. Let si := (ki, ki+1, . . . , ki+d–1) denote the state of L while outputting ci. By Congruence (A.4), si+1 ≡ ΔLsi (mod 2) for i = 1, 2, . . . , d. Define the d × d matrices S := (s1 s2 . . . sd) and T := (s2 s3 . . . sd+1), where si are treated as column vectors as before. We then have T ≡ ΔLS (mod 2). If S is invertible modulo 2, then ΔL and hence the secret control bits can be easily computed. In order to avoid this known-plaintext attack, one should introduce some non-linearity in the LFSR outputs.

A non-linear combination generator combines the output bits u1, u2, . . . , ur from r LFSRs by a non-linear function in order to generate the key . The Geffe generator of Figure A.3 gives a well-known example. It uses the non-linear function , that is, (mod 2).

Figure A.3. The Geffe generator


A non-linear filter generator generates the key as k = ψ(s0, s1, . . . , sd–1), where s0, . . . , sd–1 are the bits stored in the delay elements of a single LFSR and where ψ is a non-linear function.

Several other ad hoc schemes can destroy the linearity of an LFSR’s output. The shrinking generator, for example, uses two LFSRs L1 and L2. Both L1 and L2 are simultaneously clocked. If the output of L1 is 1, the output of L2 goes to the key stream, whereas if the output of L1 is 0, the output of L2 is discarded. The resulting key stream is an irregularly (and non-linearly) decimated subsequence of the output sequence of L2.

The non-linear function ( or ψ) eliminates the chance of mounting the straightforward known-plaintext attack described above. However, for polynomial non-linearities certain algebraic attacks are known, for example, see Courtois and Pieprzyk [67, 66].[4] Solving non-linear polynomial equations is usually more difficult than solving linear equations, but ample care should be taken to avoid accidental encounters with easily solvable systems. Complacency is a word ever excluded from a cryptologer’s world.

[4] Visit the Internet site http://www.cryptosystem.net/ for more papers in related areas.

Exercise Set A.3

A.12For each of the two classes of stream ciphers (Algorithms A.16, A.17) discuss the effects on decryption of
  1. alteration

  2. insertion or deletion

of a ciphertext bit during transmission.

A.13Suppose that the LFSR L of Figure A.4 is initialized to the state (1, 0, 0, 0). Derive the sequence of state transitions of the LFSR, and hence determine the output bit sequence of L. Argue that L is a maximum-length LFSR. Verify (according to the definition) that the connection polynomial CL(X) is primitive.

Figure A.4. An LFSR with four stages


A.14Let ΔL and CL(X) be as in Equations (A.5) and (A.6). Show that:
  1. ΔL is invertible modulo 2 if and only if a0 = 1.

  2. The characteristic polynomial of ΔL (a matrix over ) is XdCL(1/X). [H]

A.15Let L be an LFSR with connection polynomial CL(X). Further let , , denote a power series[5] over . Show that L generates the (infinite) bit sequence s0, s1, s2, . . . if and only if the product CL(X)S(X) modulo 2 is a polynomial of degree < d.

[5] A power series over a ring A is a (formal) expression of the form with each . The set of all such power series is denoted by A[[X]]. For two power series and over A, the sum f + g is defined to be the power series and the product fg is defined as the power series , where . Under these operations A[[X]] is a ring. A polynomial over A can be identified with an element of A[[X]], in which all, but finitely many, coefficients are zero.

A.16Let σ = s0s1 . . . sd–1 ≠ 00 . . . 0 be a bit string of length d ≥ 1. The linear complexity L(σ) of σ is defined to be the length of the shortest LFSR that generates σ as the leftmost part of its output (after it is initialized to a suitable state). Prove that:
  1. L(σ) ≤ d.

  2. L(σ) = d if and only if σ = 00 . . . 01. [H]

A.17Assume that the three LFSR outputs u1, u2, u3 in the Geffe generator are uniformly distributed. Show that Pr(k = u1) = 3/4 = Pr(k = u3). Thus, partial information about the internal details of the Geffe generator is leaked out in the key stream.

A.4. Hash Functions

A hash function maps bit strings of any length to bit strings of a fixed length n. For practical uses, hash functions should be easy to compute, that is, computing the hash of x should be doable in time polynomial in the size of x.

Since a hash function H maps an infinite set to a finite set, there must exist pairs (x1, x2) of distinct strings with H(x1) = H(x2). Such a pair is called a collision for H. For cryptographic applications (for example, for generating digital signatures), it should be computationally infeasible to find collisions for hash functions. To elaborate this topic further we mention the following two desirable properties of hash functions used in cryptography.

Definition A.3.

A hash function H is called second pre-image resistant, if it is computationally infeasible[6] to find, for a given bit string x1, a second bit string x2 with H(x1) = H(x2).

[6] A problem P is said to be computationally infeasible if any known or possible algorithm (deterministic or randomized) to solve P runs in infeasible (like super-polynomial) time, except perhaps for a set of some input instances, the density of which in the input space is zero (or, more generally, negligibly small).

Definition A.4.

A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x1 and x2 with H(x1) = H(x2).

In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.

Definition A.5.

An n-bit hash function H is called first pre-image resistant (or simply pre-image resistant), if it is computationally infeasible to find, for almost all bit strings y of length n, a bit string x (of any length) such that y = H(x). The qualification almost all in the last sentence was necessary, since one can compute and store the pairs (xi, H(xi)), i = 1, 2, . . . , k, for some small k and for some xi of one’s choice. If the given y turns out to be one of these hash values H(xi), a pre-image of y is easily available.

A hash function (provably or believably) satisfying all these three properties is called a cryptographic hash function. A hash function having first and second pre-image resistance is often called a one-way hash function. Some authors require both second pre-image resistance and collision resistance to define a collision-resistant hash function, but here we stick to Definitions A.3 and A.4. In what follows, an unqualified use of the phrase hash function indicates a cryptographic hash function.

Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.

Proposition A.1.

A collision resistant hash function is second pre-image resistant.

Proof

Let H be a (non-cryptographic) hash function which is not second pre-image resistant. This means that there is an algorithm A that efficiently computes second pre-images, except perhaps for a vanishingly small fraction of inputs. Choose a random bit string x1. The probability that x1 is not a bad input to A is very high and, in that case, A outputs a second pre-image x2 quickly. This gives us an efficient randomized algorithm to compute collisions (x1, x2) for H.

The converse of Proposition A.1 is not true: A second pre-image resistant hash function need not be collision resistant (Exercise A.19). Also collision resistance (or second pre-image resistance) does not imply first pre-image resistance (Exercise A.20), and first pre-image resistance does not imply second pre-image resistance (Exercise A.21).

A hash function may or may not be used in conjunction with a secret key. An unkeyed hash function is typically used to check the integrity of a message and is often called a modification detection code (MDC). A keyed hash function, on the other hand, is usually employed to authenticate the origin of a message (in addition to verifying the integrity of the message) and so is often called a message authentication code (MAC).

A.4.1. Merkle’s Meta Method

Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.

Definition A.6.

Let m, with m = n + r for some . A function that maps bit strings of length m to bit strings of length n is called a compression function. Henceforth, we will consider only those compression functions that can be computed easily, that is, in polynomial time of the input size.

Since m > n, collisions must exist for F. For cryptographic use, collisions should be difficult to locate. We can define first and second pre-image resistance and collision resistance of compression functions as before.

Algorithm A.18. Merkle’s meta method

Input: A compression function with m = n + r and a bit string x of length < 2r.

Output: The hash value H(x).

Steps:

Let λ be the bit length of x.
Set l := ⌈λ/r⌉.
If (λ is not a multiple of r) { Append rl – λ zero bits to the right of x. }
Break the padded x into blocks x1, . . . , xl each of length r.
Store in a new block xl+1 the r-bit representation of λ.
Initialize h0 := 0r.
for i = 1, 2, . . . , l + 1 { hi := F (hi–1 ‖ xi) }
Set H(x) := hl+1.

Algorithm A.18 demonstrates how a compression function can be used to design an n-bit hash function H. The input message x is first broken into l ≥ 0 blocks each of bit length r, after padding zero bits, if necessary. The initial bit length λ of x is then stored in a new block. This implies that H cannot handle bit strings of length ≥ 2r. For a reasonably big r, this is not a practical limitation. Storing λ is necessary for several reasons. First, it ensures that the for loop is executed at least once for any message. This prevents the trivial hash value 0r (the bit string of length r containing zero bits only) for the null message. Moreover, if hi = 0r for some , then, without the length block, we would get H(x1 ‖ . . . ‖ xl) = H(xi+1 ‖ . . . ‖ xl) that leads to a collision for H.

We now show if F possesses the desired properties for use in cryptography, then so does H too.

Proposition A.2.

If F is first pre-image resistant, then so is H.

Proof

Assume that H is not first pre-image resistant, that is, an efficient algorithm A exists to compute x with H(x) = y for most (if not all) . Since y = hl+1 = F (hlxl+1), a pre-image (namely, hlxl+1) of y under F is easily computable.

Proposition A.3.

If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant).

Proof

Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′.

First consider ll′. But then, in particular, the length blocks xl+1 and are different and thus is a collision for F. So for the rest of the proof we take l = l′.

Now, suppose that for some . Choose the largest such i and note that hi+1 and are defined and equal for this choice. This gives us the collision for F.

The only case that remains to be treated is for all . Since xx′, there is at least one with . For such an i, the equality implies that is a collision for F.

In order to design cryptographic hash functions, it suffices to design cryptographic compression functions. Block ciphers can be used for that purpose. Let f be a block cipher with block size n and key size r. Take m := n + r and consider the map that sends x = LR with and to the encrypted bit string fR(L). If fR are assumed to be random permutations of , the resulting compression function F possesses the desirable properties.

A.4.2. The Secure Hash Algorithm

Several custom-designed hash functions have been popularly used by the cryptography community. MD4 and MD5 are somewhat older 128-bit hash functions. Soon after its conception, MD4 was found to be vulnerable to several attacks. Also collisions for the compression function of MD5 are known. Therefore, these two hash functions have lost the desired level of confidence for cryptographic uses.

NIST has proposed a family of four hash algorithms. These algorithms are called secure hash algorithms and have the short names SHA-1, SHA-256, SHA-384 and SHA-512, which respectively produce 160-, 256-, 384- and 512-bit hash values. No collisions for SHA are known till date. In the rest of this section, we explain the SHA-1 algorithm. The workings of the other SHA algorithms are very similar and can be found in the FIPS document [222]. RIPEMD-160 is another popular 160-bit hash function.

SHA-1 (like other custom-designed hash functions mentioned above) is suitable for implementation in 32-bit processors. Suppose that we want to compute the hash SHA-1(M) of a message M of bit length λ. First, M is padded to get the bit string M′ := M ‖ 1 ‖ 0k ‖ Λ, where Λ is the 64-bit representation of λ, and where k is the smallest non-negative integer for which the bit length of M′, that is, λ + 1 + k + 64, is a multiple of 512. M′ is broken into blocks M(1), M(2), . . . , M(l) each of length 512 bits. Each M(i) is represented as a collection of sixteen 32-bit words , j = 0, 1, . . . , 15. SHA-1 supports big-endian packing, that is, stores the leftmost 32 bits of M(i), the next 32 bits of the rightmost 32 bits of M(i).

The SHA-1 computations are given in Algorithm A.19. One starts with a fixed initial 160-bit hash H(0). Successively for i = 1, 2, . . . , l the i-th message block M(i) is considered and the previous hash value H(i–1) is updated to H(i). At the end of the loop the 160-bit string H(l) is returned as SHA-1(M). Each H(i) is represented by five 32-bit words , j = 0, 1, 2, 3, 4. Here also, big-endian notation is used, that is, stores the leftmost 32 bits of H(i), . . . , the rightmost 32 bits of H(i).

The updating procedure uses logical functions fj. Here, product (like xy) implies bit-wise AND, bar (as in ) denotes bit-wise complementation and ⊕ denotes bit-wise XOR, each on 32-bit operands. The notation LRk(z) (resp. RRk(z)) stands for a left (resp. right) rotation, that is, a cyclic left (resp. right) shift, of the bit string z of length 32 by k positions.

The bits of H(i) are well-defined transformations of the bits of H(i–1) under the guidance of the bits of M(i). The good amount of non-linearity, introduced by the functions fj and the modulo 232 sums, makes it difficult to invert the transformation H(i–1)H(i) and thereby makes SHA-1 an (apparently) secure hash function.

Algorithm A.19. The SHA-1 algorithm

Input: A message M.

Output: The hash SHA-1(M) of M.

Steps:

Generate the message blocks M(i)i = 1, 2, . . . , l.
/* Initialize the hash value */
H0 := 0x67452301 efcdab89 98badcfe 10325476 c3d2e1f0.
for i = 1, 2, . . . , l {
   /* Compute the message schedule Wj, 0 ≤ j ≤ 79. */
   for 
   for j = 16, 17, . . . , 79 { Wj := LR1(Wj–3 ⊕ Wj–8 ⊕ Wj–14 ⊕ Wj–16) }
   /* Store the previous hash words */
   for 
   /* Compute the updating values */
   for j = 0, 1, . . . , 79 {
      Where
          
          and
          
      t4 := t3t3 := t2t2 := RR2(t1), t1 := t0t0 := T.
   }
   /* Update the hash value */
   for 
}
Set SHA-1(M) := H(l).

A test vector for SHA-1 is the following (here 616263 is the string “abc”):

SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.

Exercise Set A.4

A.18Let x be a bit string. Break up x into blocks x1, . . . , xl each of bit size n (after padding, if necessary). Define H1(x) := x1 ⊕ . . . ⊕ xl. Show that H1 possesses none of the desirable properties of a cryptographic hash function.
A.19Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0n+1 refers to a bit string of length n + 1 containing zero-bits only. Show that H2 is second pre-image resistant, but not collision resistant. [H]
A.20Let H be an n-bit cryptographic hash function. Show that the function H3 defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H]
A.21Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H4(x) denote the n-bit binary representation of y2 (mod m). Show that H4 is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H]
A.22Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x).
A.23Let be a collision resistant compression function.
  1. Define a compression function as follows. Let x be a bit string of length 4n. Write x = LR, where each of L and R is of length 2n bits. Define F2(x) := F1(F1(L) ‖ F1(R)). Show that F2 is also collision-resistant.

  2. Inductively define as Fk(x) := F1(Fk–1(L) ‖ Fk–1(R)), where L and R are the left and right halves of x. Show that each Fk is collision resistant.

  3. Show that if F1 is first pre-image resistant, then so is each Fk.

  4. Define an n-bit hash function H as follows. Let x be a bit string of length l. If l < n, take k := 1, else choose such that 2k–1nl < 2kn. Construct the string and define H(x) := Fk(y). Is H collision resistant? [H] (Appending a one-bit at the end of x delimits x and thereby prevents trivial collisions.)

A.24
  1. Let and be cryptographic compression functions. Show that defined as F(LR) := F1(L) ‖ F2(R) (where and ) is again a cryptographic compression function.

  2. The hash function H derived from DES (Section A.4.1) produces 64-bit hash values. For reasonable security, we require n-bit hash values with n at least 128. Use Part (a) to propose a method to make H achieve this desired level of security.

A.25Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined fj as fj(x, y, z) := xyz for all and they replaced all costly mod 232 addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H]

B. Key Exchange in Sensor Networks

B.1Introduction
B.2Security Issues in a Sensor Network
B.3The Basic Bootstrapping Framework
B.4The Basic Random Key Predistribution Scheme
B.5Random Pairwise Scheme
B.6Polynomial-pool-based Key Predistribution
B.7Matrix-based Key Predistribution
B.8Location-aware Key Predistribution

One of the keys to happiness is a bad memory.

—Rita Mae Brown

That theory is worthless. It isn’t even wrong!

—Wolfgang Pauli

You’re only as sick as your secrets.

—Anonymous

B.1. Introduction

Public-key cryptography is not a solution to every security problem. Asymmetric routines are bulky and slow, and, in practice, augment symmetric cryptography by eliminating the need for prior secret establishment of keys between communicating parties. On a workstation of today’s computing technology, this is an interesting and acceptable breakthrough. A 1 GHz processor runs one public-key encryption or key-exchange primitive in tens to hundreds of milliseconds, using at least hundreds of kilobytes of memory. That is reasonable for most applications, given that the routines are invoked rather infrequently.

Now, imagine a situation, where many tiny computing nodes, called sensor nodes, are scattered in an area for the purpose of sensing some data and transmitting the data to nearby base stations for further processing. This transmission is done by short-range radio communications. The base stations are assumed to be computationally well-equipped, but the sensor nodes are resource-starved. Such networks of sensor nodes are used in many important applications including tracking of objects in an enemy’s area for military purposes and scientific, engineering and medical explorations like wildlife monitoring, distributed seismic measurement, pollution tracking, monitoring fire and nuclear power plants and tracking patients. In some cases, mostly for military and medical applications, data collected by sensor nodes need to be encrypted before transmitting to neighbouring nodes and base stations.

Evidently one has to resort to symmetric-key cryptography in order to meet the security needs in a sensor network. Appendix B provides an overview of some key exchange schemes suitable for sensor networks.

B.2. Security Issues in a Sensor Network

Several issues make secure communication in sensor networks different from that in usual networks:

Limited resources in sensor nodes

Each sensor node contains a primitive processor featuring very low computing speed and only small amount of programmable memory. The popular Atmel ATmega 128L processor, as an example, has an 8-bit 4 MHz RISC processor and only 128 kbytes of RAM. The processor does not support instructions for multiplying or dividing integers. One requires tens of minutes to several hours for performing a single RSA or Diffie–Hellman exponentiation for cryptographic key sizes.

Limited lifetime of sensor nodes

Each sensor node is battery-powered and is expected to operate for only a few days. Once deployed sensor nodes die, it becomes necessary to add fresh nodes to the network for continuing the data collection operation. This calls for dynamic management of security objects (like keys).

Limited communication ability of sensor nodes

Sensor nodes communicate with each other and the base stations by wireless radio transmission at low bandwidth and over small communication ranges. For the Atmel ATmega 128L processor, the maximum bandwidth is 40 kbps, and the communication range is at most 100 feet (30 m).

Moreover, the deployment area may have irregularities (like physical obstacles) that further limit the communication abilities of the nodes. One, therefore, expects that a deployed sensor node can directly communicate with only few other nodes in the network.

Possibility of node capture

A sensor network is vulnerable to capture of nodes by the enemy. The captured nodes may be physically destroyed or utilized to send misleading signals and/or disrupt the normal activity of the network. As a result, no node should have full trust on the nodes with which it communicates. The relevant security goal in this context is that the captured nodes should not divulge to the enemy enough secrets to jeopardize the communication among the uncaptured nodes.

Lack of knowledge about deployment configuration

In many situations (like scattering of nodes from airplanes or trucks), the post-deployment configuration of the sensor network is not known a priori. It is unreasonable to use security algorithms that have strong dependence on locations of nodes in the network. For example, each sensor node u is expected to have only a few neighbours with which it can directly communicate. This is precisely the set of nodes with which u needs to share keys. However, the list cannot be determined before the actual deployment. An approximate knowledge of the locations of the nodes may strengthen the protocols, but robustness for handling run-time variations must be built in the protocols.

Mobility of sensor nodes

Sensor nodes may be static or mobile. Mobile nodes change the network configurations (like the lists of neighbours) as functions of time and call for time-varying security tools.

Still, sensor nodes need to communicate secretly. The clear impracticality of using public-key routines forces one to use symmetric ciphers. But setting up symmetric keys among communicating nodes is a difficult task. The number n of nodes in a sensor network can range up to several hundred thousands. Storing a symmetric key for each pair of nodes is impossible, since that requires each sensor to have a memory large enough to store n – 1 keys. On the other extreme, every communication may use a single network-wide symmetric key. In that case the capture of a single node makes communication over the entire network completely insecure.

The plot thickens. There are graceful ways out. A host of algorithms has been recently proposed to address key establishment issues in sensor networks. In the rest of this appendix, we provide a quick survey of these tools. For the sake of simplicity, we assume here that our sensor network is static, that is, the nodes have no (or negligibly small) mobility. Though the schemes described below may be adapted to mobile networks, the required modifications are not necessarily easy and the current literature does not seem to be ready to take mobility into account.

We continue to deal with sensor processors of the capability of Atmel ATmega 128L. In practice, better processors (with speed, storage and cost roughly one order of magnitude higher) are available. We assume that the size (number of nodes) n of a sensor network is (usually) not bigger than a million, and also that a sensor node has of the order of 100 neighbours in its communication range.

B.3. The Basic Bootstrapping Framework

Key establishment in a sensor network is effected by a three-stage process called bootstrapping. Subsequent node-to-node communication uses the keys established during the bootstrapping phase. The three stages of bootstrapping are as follows:

Key predistribution

This step is carried out before the deployment of the sensors. A key set-up server chooses a pool of randomly generated keys and assigns to each sensor node ui a subset of . The set is called the key ring of the node ui. The key predistribution algorithms essentially differ in the ways the sets and are selected. Each key is associated with an ID that need not be kept secret and can even be transmitted in plaintext. Similarly, each sensor node is given a unique ID which need not be maintained secretly.

Direct key establishment

Immediately after deployment, each sensor node tries to determine all other sensor nodes with which it can communicate directly and secretly. Two nodes that are within the communication ranges of one another are called physical neighbours, whereas two nodes sharing one (or more) key(s) in their key rings are called key neighbours. Two nodes can secretly (and directly) communicate with one another if and only if they are both physical and key neighbours; let us plan to call such pairs direct neighbours.

In the direct key establishment phase, each sensor node u locates its direct neighbours. To that end u broadcasts its own ID and the IDs of the keys in its key ring. Each physical neighbour v of u responds by mentioning the matching key IDs, if any, stored in the key ring of v. This is how u identifies its direct neighbours.

If sending unencrypted key IDs can be a potential threat to the security of the network, each node u can encrypt some plaintext message m by the keys in its ring and broadcasts the corresponding ciphertexts instead of the key IDs. Those physical neighbours of u that can decrypt one of the transmitted ciphertexts using one of the keys in their respective key rings establish themselves as direct neighbours of u.

Path key establishment

This is an optional stage and, if executed, adds to the connectivity of the network. Suppose that two physical neighbours u and v fail to establish a direct link between them in the direct key establishment phase. But there exists a path u = u0, u1, u2, . . . , uh–1, uh = v in the network with each ui a direct neighbour of ui+1 (for i = 0, 1, . . . , h – 1). The node u then generates a random key k, encrypts k with the key shared between u and u1 and sends the encrypted key to u1. Subsequently, u1 retrieves k by decryption, encrypts k by the key shared by u1 and u2 and sends this encrypted version of k to u2. This process is repeated until the key k reaches the desired destination v. Now, u and v can communicate secretly and directly using k and thereby become direct neighbours.

The main difficulty in this process is the discovery of a path between u and v. This can be achieved by u initiating a message reflecting its desire to communicate with v. Let u1 be a direct neighbour of u. If u1 is also a direct neighbour of v, a path between u and v is discovered. Else u1 retransmits u’s request to the direct neighbours u2 of u1. This process is repeated, until a path is established between u and v, or the number of hops exceeds a certain limit. Note that path discovery may incur substantial communication overhead and so the maximum number h of hops allowed needs to be fixed at a not-so-big value. Typically, the values h = 2, 3 are recommended.

A bootstrapping algorithm, or more precisely, a key predistribution algorithm must fulfill the following requirements. These requirements often turn out to be mutually contradictory. A key predistribution scheme attempts to achieve suitable trade-offs among them.

Compactness

Each key ring should be small enough to fit in a sensor node’s memory. Typically 50–200 cryptographic keys (say, 128-bit keys of block ciphers) can be stored in each processor. That number is between n – 1 (a key for each pair) and 1 (a master key for the entire network).

Randomness

The key rings in different nodes are to be randomly chosen from a big pool, so that there is not enough overlap between the rings of two nodes.

Network connectivity

The resulting network should be connected in the sense that the undirected graph G = (V, E) with V comprising the nodes in the network and E containing a link (u, v) if and only if u and v are direct neighbours, must be (or at least with high probability) connected.

Resilience against node capture

Ideally, the capture of any number of nodes must not divulge the secret key(s) between uncaptured direct neighbours. Practically, the fraction of communication links among uncaptured nodes, that are compromised because of node captures, must be small, at least as long as the fraction of nodes that are captured is not too high.

Scalability

Arbitrarily (but not impractically) big networks should be supported.

Future addition of nodes

One should allow new nodes to join the network at any point of time after the initial deployment, for example, to replenish captured, faulty and dead nodes.

Additional requirements may also be conceived of in order to take curative measures against active attacks and/or faults. However, a study of active attacks and of countermeasures against those is beyond the scope of our treatment here.

Detection of bad nodes

There should be a mechanism to detect the presence and identities of dead, malfunctioning and rogue nodes. Here, a rogue node stands for a captured node that is used by the enemy to disrupt the natural working of the network. Active attacks mountable by the enemy include transmission of unauthorized and misleading data across the network, making neighbours always busy and letting them run out of battery sooner than the expected lifetime (sleep deprivation attack), and so on.

Revocation of bad nodes

Faulty and rogue nodes must be pruned out of the network before they can cause sizeable harm.

Resilience against node replication

Captured nodes can be replicated and the copies deployed by the enemy with the intention that these added nodes outnumber the legitimate nodes and eventually take control of the network. There should be a strategy to detect and cure replication of malicious nodes.

We now concentrate on some concrete realizations of the bootstrapping scheme. The optional third stage (path key establishment) will often be excluded from our discussion, because there are little algorithm-specific issues in this stage.

Before we introduce specific algorithms, let us summarize the notations we are going to use in the rest of this chapter:

n= Number of nodes in the sensor network
n= (Expected) number of nodes in the physical neighbourhood of each node
d= Degree of connectivity of each node in the key/direct neighbourhood graph
Pc= Global connectivity (a high probability like 0.9999)
p= Local connectivity (probability that two physical neighbours share a key)
M= Size of the key pool
m= The size of key ring of each node (in number of cryptographic keys)
= The underlying field for the poly-pool and the matrix-pool schemes
S= Size of the polynomial (or matrix) pool
s= Number of polynomial (or matrix) shares in the key ring of each node
t= Degree of a polynomial (or dimension of a matrix)
c= Number of nodes captured
Pe= Probability of successful eavesdropping expressed as a function of c

B.4. The Basic Random Key Predistribution Scheme

The paper [88] by Eschenauer and Gligor is a pioneering research on bootstrapping in sensor networks. Their scheme, henceforth referred to as the EG scheme, is essentially the basic bootstrapping method just described.

The key set-up server starts with a pool of randomly generated keys. The number M of keys in is taken to be a small multiple of the network size n. For each sensor node u to be deployed, a random subset of m keys from is selected and given to u as its key ring. Upon deployment, each node discovers its direct neighbours as specified in the generic description. We now explain how the parameters M, m are to be chosen so as to make the resulting network connected with high probability.

Let us first look at the key neighbourhood graph Gkey on the n sensor nodes in which a link exists between two nodes if and only if these nodes are key neighbours. Let p denote the probability that a link exists between two randomly selected nodes of this graph. A result on random graphs due to Erdös and Rényi indicates that in the limit n → ∞, the probability that Gkey is connected is

Equation B.1


We fix Pc at a high value, say, 0.9999, and express the expected degree of each node in Gkey as

Equation B.2


In practice, we should also bring physical neighbourhood in consideration and look at the direct neighbourhood graph G = Gdirect on the n deployed sensor nodes. In this graph, two nodes are connected by an edge if and only if they are direct neighbours. G is not random, since it depends on the geographical distribution of the nodes in the deployment area. However, we assume that the above result for random graphs continues to hold for G too. In particular, we fix the degree of direct connectivity of each node to be (at least) d and require

Equation B.3


where n′ denotes the expected number of physical neighbours of each node, and where p′ is the probability that two physical neighbours share one or more keys in their key rings and . (Pc is often called the global connectivity and p′ the local connectivity.)

For the determination of p′, we first note that there is a total of key rings of size m that can be chosen from the pool of size M. For a fixed , the total number of ways of choosing such that does not share a key with Ki is equal to the number of ways of choosing m keys from . This number is . It then follows that

Equation B.4


Equations (B.2), (B.3) and (B.4) dictate how the key-pool size M is to be chosen, given the values of n, n′ and m.

Example B.1.

As a specific numerical example, consider a sensor network with n = 10,000 nodes. For the desired probability Pc = 0.9999 of connectedness of Gkey, we use Equation (B.2) to obtain the desired degree d as d ≥ 18.419. Let us take d = 20. Now, suppose that the expected number of physical neighbours of each deployed node is n′ = 50. By Equation (B.3), we then require p′ = d/n′ = 0.4. Finally, assume that each sensor can hold m = 150 keys in its memory. Equation (B.4) indicates that we should have M ≤ 44,195 in order to ensure p′ ≥ 0.4. In particular, we may take M = 40,000.

Let us now study the resilience of the EG scheme against node captures. Assume that c nodes are captured at random from the network and that u and v are two uncaptured nodes that are direct neighbours. We compute the probability Pe that an eavesdropper can decipher encrypted communication between u and v based on the knowledge of the keys available from the c captured key rings. Clearly, smaller values of Pe indicate higher resilience against node captures.

Suppose that u and v use the key k for communication between them. Then, Pe is equal to the probability that k resides in one of the key rings of c captured nodes. Since each key ring consists of m keys randomly chosen from a pool of M keys, the probability that a particular key k is not available in a key ring is and consequently the probability that k does not appear in all of the c compromised key rings is . Thus, the probability of successful eavesdropping is

Example B.2.

As in Example B.1, take n = 10,000, n′ = 50, m = 150 and M = 40,000. If c = 100 nodes are captured, the fraction of compromised communication is Pe ≈ 0.313. Thus, a capture of only 100 nodes leads to a compromise of about one-third of the traffic. That is not a satisfactory figure. We need better algorithms.

B.4.1. The q-composite Scheme

Chan et al. [44] propose several modifications of the basic EG scheme in order to improve upon the resilience of the network against node capture. The q-composite scheme, henceforth abbreviated as the qC scheme, is based on the requirement of a bigger overlap of key rings for enabling nodes to communicate.

As in the EG scheme, the key set-up server decides a pool of M random keys and loads the key ring of each node with a random subset of size m of . Let the network consist of n nodes.

In the direct key establishment phase, each node u discovers all its physical neighbours that share q or more keys with u, where q is a predetermined system-wide parameter. Those physical neighbours that do so are now called direct neighbours of u. Let v be a direct neighbour of u and let q′ ≥ q be the actual number of keys shared by u and v. Call these keys k1, k2, . . . , kq. The nodes use the key

k := H(k1k2‖ · · · ‖kq)

for future communication, where ‖ denotes string concatenation and H is a hash function. A pair of physical neighbours that share < q predistributed keys do not communicate directly.

Recall that for the basic EG scheme q = 1 and the key k for communication between direct neighbours is taken to be one shared key instead of a hash value of all shared keys. The motivation behind going for the qC scheme is that requiring a bigger overlap between the key rings of a pair of physical neighbours leads to a smaller probability Pe of successful eavesdropping, since now the eavesdropper has to possess the knowledge of at least q shared keys (not just one). However, the requirement of q (or more) matching keys between communicating nodes restricts the key pool size M more than the EG scheme, and consequently a capture of fewer nodes reveals a bigger fraction of the total key pool to the eavesdropper. Chan et al. [44] report that the best trade-off is achieved for the value q = 2 or 3.

Let us now derive the explicit expressions for M and Pe. Equations (B.1), (B.2) and (B.3) hold for the qC scheme with the sole exception that now the interpretation of the probability p′ of direct neighbourhood is different. There is a total of ways of choosing two random key rings of size m from a pool of M keys. Let us compute the number of such pairs of key rings sharing exactly r keys. First, these shared r keys can be chosen in ways. Out of the remaining Mr keys, the remaining mr keys for the first ring can be chosen in ways. Finally, the remaining mr keys for the second ring can be chosen in ways from Mm keys not present in the first ring. Thus, we have

that is,

is the equivalent of Equation (B.4) for the qC scheme.

Example B.3.

As in Example B.1, consider n = 10,000, n′ = 50, m = 150. For d = 20, we require p′ ≥ 0.4. This, in turn, demands M ≤ 16,387 for q = 2 and M ≤ 9,864 for q = 3. Compare these with the requirement M ≤ 44,195 for the EG scheme.

Let us now calculate the probability Pe of successfully deciphering the communication between two uncaptured nodes u and v, given that c nodes are already captured by the eavesdropper. Let q′ ≥ q be the actual number of keys shared by u and v; this happens with probability . Each of these common keys is available to the eavesdropper with a probability . It follows that

Example B.4.

Let us continue with the network of Examples B.1, B.2 and B.3. The following table summarizes the probabilities Pe for various values of c. For the EG scheme, we take M = 40,000, whereas for the qC scheme, we take M = 16,000 for q = 2 and M = 9,800 for q = 3.

 Pe
Schemec = 10c = 20c = 30c = 40c = 50c = 75c = 100c = 150
EG0.0370.0720.1070.1400.1710.2460.3130.431
2C0.0050.0190.0410.0680.1010.1960.3000.499
3C0.0020.0110.0320.0660.1110.2550.4130.678

This table indicates that when the number of nodes captured is small, the qC scheme outperforms the EG scheme. However, for large values of c, the effects of smaller values of the key-pool size show up, leading to a poorer performance of the qC schemes compared to the EG scheme.

B.4.2. Multi-path Key Reinforcement

Another way to improve the resilience of the network against node captures is the multi-path key reinforcement scheme proposed again by Chan et al. [44]. As in the EG scheme, sensor nodes are deployed each with m keys in its key ring chosen randomly from a pool of M keys. Let u and v establish themselves as direct neighbours sharing the key k. Instead of using k itself as the key for future communication, the nodes try to locate several pairwise node-disjoint paths between them. Such a path u = v0, v1, . . . , vl = v consists of pairs of direct neighbours (vi, vi+1) for i = 0, . . . , l – 1. A randomly generated key k′ is then routed securely along the path from u to v.

Assume that r node-disjoint paths between u and v are discovered and the random keys are transfered securely along these paths. The nodes u and v then use the key

for future communication.

The reason why this scheme improves resilience against node captures is that even if the original k resides in the memory of a captured node, the new key k′ is computable by the adversary if and only if she can obtain all of the r session secrets . The bigger r is, the more difficult it is for the adversary to eavesdrop on all of the r node-disjoint paths. On the other hand, if the lengths of these paths are large, then the probability of eavesdropping at some links of the paths increases. Moreover, increasing the lengths of the paths incurs bigger communication overhead. The proponents of the scheme recommend only 2-hop multi-path key reinforcement.

We do not go into the details of the analysis of the multi-path key reinforcement scheme, but refer the reader to Chan et al. [44]. We only note that though it is possible to use multi-path key reinforcement for the q-composite scheme, it is not a lucrative option. The smaller size of the key pool for the q-composite scheme tends to nullify the effects of multi-path key reinforcement.

B.5. Random Pairwise Scheme

A pairwise key predistribution scheme offers perfect resilience against node captures, that is, the capture of any number c of nodes does not reveal any information about the secrets used by uncaptured nodes. This corresponds to Pe = 0 irrespective of c. This desirable property of the network is achieved by giving each key to the key rings of only two nodes. Moreover, the sharing of a key k between two unique nodes u and v implies that these nodes can authenticate themselves to one another — no other node possesses k and can prove itself as u to v or as v to u.

Pairwise keys can be distributed to nodes in many ways. Now, we deal with random distribution. Let m denote the size of the key ring of each sensor node. For each node u in the network, the key set-up server randomly selects m other nodes v1, . . . , vm and distributes a new random key ki to each of the pairs (u, vi) for i = 1, . . . , m. This distribution mechanism should also ensure that two nodes u, v in the network share at most one key. If k is given to u and v, the set-up server also attaches the ID of v to the copy of k in the key ring of u and the ID of u to the copy of k in the key ring of v.

In the direct key establishment phase, each node u broadcasts its own ID. Each physical neighbour v of u, that finds the ID of u stored against a key in the key ring of v, identifies u as its direct neighbour and also the unique key shared by u and v.

The analysis of the random pairwise scheme is a bit tricky. Here, the global connectivity graph Gkey is m-regular, that is, each node has degree exactly m and we cannot expect to maintain this degree locally too. On the other hand, it is reasonable to assume under a random deployment model that the fraction of nodes with which a given node shares pairwise keys remains the same both locally and globally. More precisely, we equate p′ with p, that is,

Equation B.5


Here, d denotes the desired local degree of a node. Equation (B.2) gives the formula for d in terms of the global connectivity Pc. For Pc = 0.9999, we have d = 16.11 for n = 1,000, d = 18.42 for n = 10,000, d = 20.72 for n = 100,000, and d = 23.03 for n = 1,000,000. That is, the value of d does not depend heavily on n, as long as n ranges over practical values. In particular, one may fix d = 20 (or d = 25 more conservatively) for all applications.

Equation (B.5) implies

This equation reflects the drawback of the random pairwise scheme. The value m is limited by the memory of a sensor node, whereas n′ is dictated by the density of nodes in the deployment area and d can be taken as a constant, and so the network size n is bounded above by the quantity called the maximum supportable network size. The basic scheme (and its variants) support networks of arbitrarily large sizes, whereas the random pairwise scheme has only limited supports.

Example B.5.

Take m = 150, n′ = 50 and d = 20. The maximum supportable network size is then . This is too small to be useful. We require modifications of the random pairwise scheme in order to be able to use it in practice.

B.5.1. Multi-hop Range Extension

Since m and d are limited by hard constraints, the only way to increase the maximum supportable network size is to increase the effective size n′ of the physical neighbourhood of a node. The multi-hop range extension strategy accomplishes that. In the direct key establishment phase, each node u broadcasts its ID. Each physical neighbour v of u re-broadcasts the ID of u. Each physical neighbour w of v then re-re-broadcasts the ID of u. This process is continued for a predetermined number r of hops. Any node u′ reachable from u in ≤ r hops and sharing a pairwise key with u can now establish a path of secure communication with u. During a future communication between u and u′, the intermediate nodes in the path simply forward a message encrypted by the pairwise key between u and u′. Using r hops thereby increases the effective radius of physical neighbourhood by a factor of r, and consequently the number of effective neighbours of each node gets multiplied by a factor of r2. Thus, the maximum supportable network size now becomes

For r = 3 and for the parameters of Example B.5, this size now attains a more decent value of 3375.

Increasing r incurs some cost. First, the communication overhead increases quadratically with r. Second, since intermediate nodes in a multi-hop path simply retransmit messages without authentication, chances of specific active attacks at these nodes increase. Large values of r are, therefore, discouraged.

B.6. Polynomial-pool-based Key Predistribution

Liu and Ning’s polynomial-pool-based key predistribution scheme (abbreviated as the poly-pool scheme) [181, 183] is based on the idea presented by Blundo et al. [28]. Let be a finite field with q just large enough to accommodate a symmetric encryption key. For a 128-bit block cipher, one may take q to be smallest prime larger than 2128 (prime field) or 2128 itself (extension field of characteristic 2). Let be a bivariate polynomial that is assumed to be symmetric, that is, f(X, Y) = f(Y, X). Let t be the degree of f in each of X and Y. A polynomial share of f is a univariate polynomial f(α)(X) := f(X, α) for some element . Two shares f(α) and f(β) of the same polynomial f satisfy

Equation B.6


Thus, if the shares f(α), f(β) are given to two nodes, they can come up with the common value as a shared secret between them.

Given t + 1 or more shares of f, one can reconstruct f(X, Y) uniquely using Lagrange’s interpolation formula (Exercise 2.53). On the other hand, if only t or less shares are available, there are many (at least q) possibilities for f and it is impossible to determine f uniquely. So the disclosure of up to t shares does not reveal the polynomial f to an adversary and uncompromised shared keys based on f remain secure.

Using a single polynomial for the entire network is not a good proposal, since t is limited by memory constraints in a sensor node. In order to increase resilience against node captures, many bivariate polynomials need to be used, and shares of random subsets of this polynomial pool are assigned to the key rings of individual nodes. This is how the poly-pool scheme works. If the degree t equals 0, this scheme degenerates to the EG scheme.

The key set-up server first selects a random pool of S symmetric bivariate polynomials in each of degree t in X and Y. Some IDs are also generated for the nodes in the network. For each node u in the network, s polynomials f1, f2, . . . , fs are randomly picked up from and the polynomial shares f1(X, α), f2(X, α), . . . , fs(X, α) are loaded in the key ring of u, where α is the ID of u. Each key ring now requires space for storing s(t + 1) log q bits, that is, for storing m := s(t + 1) symmetric keys.

Upon deployment, each node u broadcasts the IDs of the polynomials, the shares of which reside in its key ring. Each physical neighbour v of u, that has shares of some common polynomial(s), establishes itself as a direct neighbour of u. The exact pairwise key k between u and v is then calculated using Equation (B.6). If broadcasting polynomial IDs in plaintext is too unsafe, each node u can send some message encrypted by potential pairwise keys based on its polynomial shares. Those physical neighbours that can decrypt one of these encrypted messages have shares of common polynomials.

Like the EG scheme, the poly-pool scheme can be analysed under the framework of random graphs. Equations (B.1), (B.2) and (B.3) continue to hold under the poly-pool scheme. However, in this case the local connection probability p′ is computed as

Equation B.7


Given constraints on the network and the nodes, the desired size S of the polynomial pool can be determined from this formula.

Let us now compute the probability Pe of compromise of communication between two uncaptured nodes u, v as a function of the number c of captured nodes. If ct, the eavesdropper cannot gather enough polynomial shares to learn about any polynomial in , that is, Pe = 0. So assume that c > t and let pr denote the probability that exactly r shares of a given polynomial f (say, the one whose shares are used by the two uncaptured nodes u, v) are available in the key rings of the c captured nodes. The probability that a share of f is present in a key ring is and so (by the Bernoulli distribution)

Equation B.8


Since t + 1 or more shares of f are required for the determination of f, we have

Equation B.9


Example B.6.

Let n = 10,000 (network size), n′ = 50 (expected size of physical neighbourhood of a node), m = 150 (key ring size in number of symmetric keys) and Pc = 0.9999 (global connectivity). Let us plan to choose bivariate polynomials of degree t = 49, so that each key ring can hold s = 3 polynomial shares.

For the determination of S, we first compute d = 20 as in Example B.1. We then require . The biggest size S satisfying this bound is derived from Equation (B.7) as S = 20.

The following table lists the probability Pe for various values of c.

c50100150200250300350400
Pe6.38×10–422.30×10–161.70×10–81.52×10–40.01960.2310.6680.932

The table shows substantial improvement in resilience against node capture as achieved by the poly-pool scheme over the EG and qC schemes.

B.6.1. Pairwise Key Predistribution

The poly-pool scheme can be made pairwise by allowing no more than t + 1 shares of any polynomial to be distributed among the nodes. The best that the adversary can achieve is a capture of nodes with all these t + 1 shares and a subsequent determination of the corresponding bivariate polynomial. But this knowledge does not help the adversary, since no other node in the network uses a share of this compromised polynomial. That is, two uncaptured nodes continue to communicate with perfect secrecy.

However, like the random pairwise scheme, the pairwise poly-pool scheme suffers from the drawback that the maximum supportable network size is now limited by the quantity . For the parameters of Example B.6, this size turns out to be an impractically low 333.

B.6.2. Grid-based Key Predistribution

The grid-based key predistribution considerably enhances the resilience of the network against node captures. To start with, let us play a bit with Example B.6.

Example B.7.

Take n = 10,000, n′ = 50 and m = 150. We calculated that the optimal value of S that keeps the network connected with high probability is S = 20. Now, let us instead take a much bigger value of S, say, S = 200. First, let us look at the brighter side of this choice. The probability Pe is listed in the following table as a function of c.

c5001000150020002500300035004000
Pe1.90×10–254.88×10–133.10×10–74.68×10–40.02820.2450.6550.917

That is a dramatic improvement in the resilience figures. It, however, comes at a cost. The optimal value S = 20 was selected in Example B.6 in order to achieve a desired connectivity in the network. With S = 200, the probability p′ reduces from 0.404 to , and each node is expected to have only about 2 direct neighbours. As a result, the network is likely to remain disconnected with high probability.

The grid-based key predistribution allocates polynomial shares cleverly to the nodes so as to achieve resilience figures of the last example with a reasonable guarantee that the resulting network remains connected. Let n be the size of the network and take . For the sake of simplicity, let us assume that n = σ2. The n nodes are then placed on a σ × σ square grid. The node at the (i, j)-th grid location (where i, ) is identified by the pair (i, j). The set-up server generates 2σ random symmetric bivariate polynomials , each of degree t in both X and Y. The i-th polynomial corresponds to the i-th row and the j-th polynomial to the j-th column in the grid. The key ring of the node at location (i, j) in the grid is given the two polynomial shares and . The memory required for this is equivalent to the storage for 2(t + 1) symmetric keys.

Now, look at the key establishment phase. Let two nodes u, v with IDs (i, j) and (i′, j′) be physical neighbours after deployment. First, consider the simple case i = i′. Both the nodes have shares of the polynomial and can arrive at the common secret value using the column identities of one another. Similarly, if j = j′, the nodes can compute the shared secret . It follows that each node can establish keys directly with 2(σ – 1) other nodes in the network. That’s, however, a truly small fraction of the entire network.

Assume now that ii′ and jj′. If the node w with identity either (i, j′) or (i′, j) is in the physical neighbourhood of both u and v, then there is a secure link between u and w, and also one between w and v. The nodes u and v can then establish a path key via the intermediate node w.

So suppose also that neither (i, j′) nor (i′, j) resides in the communication ranges of both u and v. Consider the nodes w1 := (i, k) and w2 := (i′, k) for some . Suppose further that w1 is in the physical neighbourhood of u, w2 in that of w1 and v in that of w2. But then there is a secure u, v-path comprising the links uw1, w1w2 and w2v. Similarly, the nodes (k, j) and (k, j′) for each ki, i′ can help u and v establish a path key. To sum up, there are 2(σ – 2) potential three-hop paths between u and v.

If all these three-hop paths fail, one may go for four-hop, five-hop, . . . paths, but at the cost of increased communication overhead. As argued in Liu and Ning [181, 183], exploring paths with ≤ 3 hops is expected to give the network high connectivity.

For the grid-based scheme, we have S = 2σ (the key pool size) and s = 2 (the number of polynomial shares in each node’s key ring). Thus, the probability Pe can now be derived like Equations (B.8) and (B.9) as

Pe = 1 – (p0 + p1 + · · · + pt) = pt+1 + pt+2 + · · · + pc,

where

Example B.8.

Take n = 10,000 and m = 150. Since each node has to store only two polynomial shares, we now take t = 74. Moreover, σ = 100, that is, the size of the polynomial pool is S = 200. The probability Pe can now be tabulated as a function of c (number of nodes captured) as follows:

c1000200030004000500060007000
Pe2.45×10–401.99×10–212.68×10–124.35×10–75.41×10–40.03340.290

This is a very pretty performance. The capture of even 60 per cent of the nodes leads to a compromise of only 3.34 per cent of the communication among uncaptured nodes.

This robustness of the grid-based distribution comes at a cost, though. The path key establishment stage is communication-intensive and is mandatory for ensuring good connectivity. Moreover, this stage is based on the assumption that during bootstrapping not many nodes are captured. If this assumption cannot necessarily be enforced, the scheme forfeits much of its expected resilience guarantees.

B.7. Matrix-based Key Predistribution

The matrix-based key predistribution scheme is derived from the idea proposed by Blom [25]. It is similar to the polynomial-based key predistribution and employs symmetric matrices (in place of symmetric polynomials). Let be a finite field with q just large enough to accommodate a symmetric key and let G be a t × n matrix over , where t is determined by the memory of a sensor node and n is the number of nodes in the network. It is not required to preserve G with secrecy. Anybody, even the enemies, may know G. We only require G to have rank t, that is, any t columns of G must be linearly independent. If g is a primitive element of , the following matrix is recommended.

Equation B.10


In a memory-starved environment, this G has a compact representation, since its j-th column is uniquely identified by the value gj. The remaining elements in the column can be easily computed by performing few multiplications.

Let D be a secret t × t symmetric matrix, and A the n × t matrix defined by:

A := (DG)t = GtDt = GtD.

Finally, define the n × n matrix

K := AG.

It follows that K = AG = Gt DG = Gt (Gt Dt)t = Gt (Gt D)t = Gt At = (AG)t = Kt, that is, K is a symmetric matrix. If the (i, j)-th element of K is denoted by kij, we have kij = kji, that is, this common value can be used as a pairwise key between the i-th and j-th nodes.

Let the (i, j)-th element of A be denoted by aij for 1 ≤ in and 1 ≤ jt. Also let gij, 1 ≤ it and 1 ≤ jn, denote the (i, j)-th element of G. But then the pairwise key kij = kji is expressed as:

Thus, the i-th row of A and the j-th column of G suffice for the i-th node to compute kij. Similarly, the j-th row of A and the i-th column of G allow the j-th node to compute kji. In view of this, every node, say, the i-th node, is required to store the i-th row of A and the i-th column of G. If G is as in Equation (B.10), only gi needs to be stored instead of the full i-th column of G. Thus, the storage of t + 1 elements of (equivalent to t + 1 symmetric keys) suffices.

During direct key establishment, two physical neighbours exchange their respective columns of G for the computation of the common key. Since G is allowed to be a public knowledge, this communication does not reveal secret information to the adversary.

Suppose that the adversary gains knowledge of some t′ ≥ t rows of A (say, by capturing nodes). We also assume that the matrix G is completely known to the adversary. The adversary picks up any t known rows of A and constructs a t × t matrix A′ comprising these rows. But then A′ = GD, where G′ is a suitable t × t submatrix of G. Since G is assumed to be of rank t, G′ is invertible and so the secret matrix D can be easily computed. Conversely, if D is known to the adversary, she can compute A and, in particular, any t′ ≥ t rows of A.

If only t′ < t rows are known to the adversary, then any choice of any tt′unknown rows of A yields a value of the matrix D, and subsequently we can construct the remaining nt unknown rows of A. In other words, D cannot be uniquely recovered from a knowledge of less than t rows of A. This task is difficult too, since there is an infeasible number of choices for assigning values to the elements of the unknown tt′rows of A.

To sum up, the matrix-based key predistribution scheme is completely secure, if less than t nodes are only captured. On the other hand, if t or more nodes are captured, then the system is completely compromised. Thus, the resilience against node capture of this scheme is determined solely by t and is independent of the size n of the network. The parameter t, in turn, is restricted by the memory of a sensor node (a node has to store t + 1 elements of ).

In order to overcome this difficulty, Du et al. [79] propose a matrix-pool-based scheme. Here, S matrices A1, A2, . . . , AS are computed from S pairwise different secret matrices D1, D2, . . . , DS. The same G may be used for all these key spaces. Each node is given shares (that is, rows) of s matrices randomly chosen from the pool {A1, A2, . . . , AS}. The resulting details of the matrix-pool-based scheme are quite analogous to those pertaining to the polynomial-pool-based scheme described in the earlier section, and are omitted here.

B.8. Location-aware Key Predistribution

The key predistribution algorithms discussed so far are based on a random deployment model. In practice, the deployment model (like the expected location of each node and the overall geometry of the deployment area) may be known a priori. This knowledge can be effectively exploited to tune the key predistribution algorithms so as to achieve better connectivity and higher resilience against node capture. As an example, consider sensor nodes deployed from airplanes in groups or scattered uniformly from trucks. Since the approximate tracks of these vehicles are planned a priori, the key rings of the nodes can be loaded appropriately to achieve the expected performance enhancements.

Two nodes that are in the physical neighbourhoods of one another need only share a pairwise key. Therefore, the basic objective of designing location-aware schemes is to predistribute keys in such a way that two nodes that are expected to remain close in the deployment area are given common pairwise keys, whereas two nodes that are expected to be far away after deployment need not share any pairwise key. The actual deployment locations of the nodes cannot usually be predicted accurately. Nonetheless, an approximate knowledge of the locations can boost the performance of the network considerably. The smaller the errors between the expected and actual locations of the nodes are, the better a location-aware scheme is expected to perform.

B.8.1. Closest Pairwise Keys Scheme

Liu and Ning [182] propose a modification of the random pairwise key scheme (Section B.5) based on deployment knowledge. Let there be n sensor nodes in the network with each node capable of storing m cryptographic keys. The expected deployment location of each node is provided to the key set-up server. For each node u in the network, the server determines m other nodes whose expected locations of deployment are closest to that of u and for which pairwise keys with u have not already been established. For every such node v, a new random key kuv is generated. The key-plus-ID combination (kuv, v) is loaded in u’s key ring, whereas the pair (kuv, u) is loaded in v’s key ring.

This natural and simple-minded strategy provides complete security against node captures, as it is a pairwise key distribution scheme. Now, there is no limitation on the maximum supportable network size (under the reasonable assumption that there are much less than 2l nodes in the network, where l is the bit length of a cryptographic key, say, 64 or 128). Moreover, the incorporation of deployment knowledge increases the connectivity of the network. In order to analyse this gain, we first introduce some formal notations.

For the sake of simplicity, we assume that the deployment region is two-dimensional, so that every point in that region is expressed by two coordinates x and y. Let u be a sensor node whose expected deployment location is (ux, uy) and whose actual deployment location is . This corresponds to a deployment error of . The actual location (or equivalently the error eu) is modelled as a continuous random variable that can assume values in . The probability density function fu of characterizes the pattern of deployment error. One possibility is to assume that is uniformly distributed within a circle with centre at (ux, uy) and of radius ∊ called the maximum deployment error. We then have:

Equation B.11


An arguably more realistic strategy is to model as a random variable following the two-dimensional normal (Gaussian) distribution with mean (ux, uy) and variance σ2. The corresponding density function is:

Let u and v be two deployed nodes. We assume that each node has a communication range of ρ. We also make the simplifying assumption that the different nodes are deployed independently, that is, and are independent random variables. The probability that u and v lie in the communication ranges of one another can be expressed as a function of the expected locations (ux, uy) and (vx, vy) as:

Here, the integral is over the region C of defined by .

Let n′ denote the number of physical neighbours of u (or of any sensor node). We know that u shares pairwise keys with exactly m nodes. We assume that these key neighbours of u are distributed uniformly in a circle centred at u and of radius ρ′. The expected value of ρ′ is:

Let v be a key neighbour of u. The probability that v lies in the physical neighbourhood of u is given by

where C′ is the region (vxux)2 + (vyuy)2 ≤ ρ′2. Therefore, u is expected to have m × p(u) direct neighbours. Since the size of the physical neighbourhood of u is n′, the local connectivity, that is, the probability that u can establish a pairwise key with a physical neighbour is given by

In general, it is difficult to compute the above integrals. Liu and Ning [182] compute the probability p′ for the density function given by Equation (B.11) and establish that p′ ≈ 1 for small deployment errors, namely ∊ ≤ ρ. As ∊ increases, p′ gradually reduces to the corresponding probability for the random pairwise scheme.

In order to add sensor nodes at a later point of time, the key set-up server again uses deployment knowledge. The keys rings of the new nodes are loaded based on the expected deployment locations of these nodes and on the (expected or known) locations of the deployed nodes. Pairwise keys between the new and the deployed nodes are communicated to the deployed nodes over secure channels (routing through uncompromised nodes).

B.8.2. Location-aware Polynomial-pool-based Scheme

Several variants of the closest pairwise keys scheme have been proposed. Liu and Ning themselves propose an extension based on pseudorandom functions [182]. Du et al. propose a variant of the basic (EG) scheme based on a specific model of deployment [80]. We end this section by briefly outlining a location-aware adaptation of the polynomial-pool-based scheme (Section B.6).

For simplicity, let us assume that the deployment region is a rectangular area. This region is partitioned into a 2-dimensional array of rectangular cells. Let the partition consist of R rows and C columns. The cell located at the i-th row and the j-th column is denoted by Ci,j. The neighbours of the cell Ci,j are taken to be the four adjacent cells: Ci–1,j, Ci+1,j, Ci,j–1, Ci,j+1.

The key set-up server first decides a finite field with q just big enough to accommodate a cryptographic key. The server also chooses R×C random symmetric bivariate polynomials . The polynomial fi,j is meant for the cell Ci,j. The degree t (in both X and Y) of each fi,j is so chosen that each sensor node has sufficient memory to store the shares of five such polynomials.

Let u be a node to be deployed and let the expected deployment location of u lie in the cell Ci,j called the home cell of u. The key ring of u is loaded with the shares (evaluated at u) of the five polynomials corresponding to the home cell and its four neighbouring cells. More precisely, u gets the five shares: fi,j(X, u), fi–1,j(X, u), fi+1,j(X, u), fi,j–1(X, u), and fi,j+1(X, u). The set-up server also stores in u’s memory the ID (i, j) of its home cell.

In the direct key establishment phase, each node u broadcasts the ID (i, j) of its home cell (or some messages encrypted by potential pairwise keys). Those physical neighbours whose home cells are either the same as or neighbouring to that of u can establish pairwise keys with u.

An analysis of the performance of this location-aware poly-pool-based scheme can be carried out along similar lines to the closest pairwise scheme. We leave out the details here and refer the reader to Liu and Ning [182].

C. Complexity Theory and Cryptography

C.1Introduction
C.2Provably Difficult Computational Problems Are not Suitable
C.3One-way Functions and the Complexity Class UP

. . . complexity turns out to be most elusive precisely where it would be most welcome.

—C. H. Papadimitriou [229]

Real knowledge is to know the extent of one’s ignorance.

—Confucius

The complex develops out of the simple.

—Colin Wilson

C.1. Introduction

It is worthwhile to ask the question why public-key cryptography must be based on problems that are only believed to be difficult. Complexity theory suggests concrete examples of provably intractable problems. This appendix provides a brief conceptual explanation why these provably difficult problems cannot be used for building cryptographic protocols. We may consequently conclude that at present we cannot prove a public-key cryptosystem to be secure. That is bad news, but we have to live with it.

Here, we make no attempts to furnish definitions of formal complexity classes. The excellent books by Papadimitriou [229] and by Sipser [280] can be consulted for that purpose. Here is a list of the complexity classes that we require for our discussion. The relationships between these classes are depicted in Figure C.1. All the containments shown in this figure are conjectured to be proper. With an abuse of notations we identify functional problems with decision problems.

Table C.1. Some complexity classes
ClassBrief description
PLanguages accepted by deterministic polynomial-time Turing machines
NPLanguages accepted by non-deterministic polynomial-time Turing machines
coNPComplements of languages in NP
UPLanguages accepted by unambiguous polynomial-time Turing machines
PSPACELanguages accepted by polynomial-space Turing machines
EXPTIMELanguages accepted by deterministic exponential-time Turing machines
EXPSPACELanguages accepted by exponential-space Turing machines

Figure C.1. Relations between complexity classes


C.2. Provably Difficult Computational Problems Are not Suitable

The problem, arguably the deepest unsolved problem in theoretical computer science, may be suspected to have some bearing on public-key cryptography. Under the assumption that P ≠ NP, one may feel tempted to go for using NP-complete problems for building secure cryptosystems. Unfortunately, this tempting invitation does not prove to be fruitful. Several cryptosystems based on NP-complete problems were broken and that is not really a surprise.

It may be the case that P = NP, and, if so, all NP-complete problems are solvable in polynomial time. It may, therefore, be advised to select problems that lie outside NP, that is, in strictly bigger complexity classes. By the time and space hierarchy theorems, we have and . Both EXPTIME and EXPSPACE have complete problems. An EXPTIME-complete problem cannot be solved in polynomial time, whereas an EXPSPACE-complete problem cannot be solved in polynomial space nor in polynomial time too. How about using these complete problems for designing cryptosystems? The idea may sound interesting, but these provably exponential problems turn out to be even poorer, perhaps irrelevant, for use in cryptography.

Let fe and fd be the encryption and decryption transforms for a public-key cryptosystem. We assume that the set of plaintext messages and the set of ciphertext messages are both finite. (Public-key cryptosystems are like block ciphers in this respect.) Moreover, since a ciphertext c = fe(m, e) is computable in polynomial time, the length of c is bounded by a polynomial in the length of m. An intruder can non-deterministically guess messages m (from the finite space) and check if c = fe(m, e) to validate the correctness of the guess. It, therefore, follows that deciphering a ciphertext message (with no additional information) is a problem in NP. That is the reason why we should not look beyond NP.

However, the full class NP, in particular, the most difficult (that is, complete) problems of NP, may be irrelevant for cryptography, as we argue in the next section. In other words, for building cryptosystems we expect to effectively exploit problems that are believed to be easier than NP-complete. Both the integer factoring and the discrete log problems are in the class NP ∩ coNP. We have P ⊆ NP ∩ coNP. It is widely believed that this containment is proper. Also NP ∩ coNP is not known (nor expected) to have complete problems. Even if , both the factoring and the discrete log problems need not be outside P, since we are unlikely to produce completeness proofs for them. Only historical evidences exist, in favor of the fact that these two problems are difficult. The situation may change tomorrow. Complexity theory does not offer any formal protection.

Exercise Set C.2

C.1Prove that the primality testing problem

is in NP ∩ coNP.

(Remark: The AKS algorithm is a deterministic poly-time primality testing algorithm and therefore PRIME is in P and so trivially in NP ∩ coNP too. It can, however, be independently proved that primes have succinct certificates.)

C.2Consider the decision version of the integer factorization problem:

  1. Prove that .

  2. Given a poly-time algorithm for DIFP, design a poly-time algorithm that factors an integer (that is, that solves the functional problem IFP).

C.3Let G be a finite cyclic multiplicative group with a generator g. Assume that one can compute products in G in polynomial time. Consider the decision version of the discrete log problem in G:

Here, indices (indg a) are assumed to lie between 0 and (#G) – 1.

  1. Prove that .

  2. Given a poly-time algorithm for DDLP, design a poly-time algorithm that computes indices in G (that is, that solves the functional problem DLP in G).

C.3. One-way Functions and the Complexity Class UP

Any public-key encryption behaves like a one-way function, easy to compute but difficult to invert.

Definition C.1.

Let Σ be an alphabet (a finite set of symbols). One may assume, without loss of generality, that Σ = {0, 1}. Let Σ* denote the set of all strings over Σ. A function f : Σ* → Σ* is called a one-way function, if it satisfies the following properties.

  1. f must be injective, that is, for every β the inverse f–1(β), if existent, is unique.

  2. For some real constant k > 0, we have |α|1/k ≤ |f(α)| ≤ |α|k for all . (Here, |α| denotes the length of a string .)

  3. f can be computed in deterministic polynomial time, that is, .

  4. f–1 must not be computable in polynomial time[1], that is, f–1 ∉ P. In view of Property (2), we have . So we require .

    [1] A stronger (but essential) requirement is that f–1 must not be computable by polynomial-time probabilistic algorithms.

Property (1) ensures unique decryption. Property (2) implies that the length of f(α) is polynomially bounded both above and below by the length of α. Property (3) suggests ease of encryption, whereas Property (4) suggests difficulty of decryption.

We do not know whether there exists a one-way function. The following functions are strongly suspected to be one-way. However, we do not seem to have any clues about how we can prove these functions to be one-way.

Example C.1.
  1. The function that multiplies two primes p, q with p < q is believed to be one-way. Computing its inverse is the RSA integer factoring problem.

  2. The discrete exponentiation function in a finite field , that maps , 0 ≤ xq – 2, to for some fixed is suspected to be one-way. Its inverse is the discrete logarithm function.

  3. The RSA encryption function mme (mod n) for some fixed parameters n, e is alleged to be one-way. Its inverse is RSA decryption.

It is evident that if P = NP, there cannot exist one-way functions. The converse of this is not true, that is, even if P ≠ NP, there may exist no one-way functions.

Definition C.2.

A non-deterministic Turing machine which has at most one accepting branch of computation for every input string is called an unambiguous Turing machine. The class of languages accepted by poly-time unambiguous Turing machines is denoted by UP.

Clearly, P ⊆ UP ⊆ NP. Both the containments are assumed to be proper. The importance of the class UP stems from the following result:

Theorem C.1.

There exists a one-way function if and only if P ≠ UP.

Therefore, the question is relevant for cryptography and not the question. The class UP is not known (nor expected) to have complete problems. So locating a one-way function may be a difficult task. But at the minimum we are now in the right track.[2] Complexity theory helped us shift our attention from NP (or bigger classes) to UP.

[2] Well, hopefully!

In order to use a one-way function f for cryptographic purposes, we require additional properties of f. Computing f–1 must be difficult for an intruder, whereas the same computation ought to be easy to the legitimate recipient. Thus, f must support poly-time inversion, provided that some secret piece of information (the trapdoor) is available during the computation of the inverse. A one-way function with a trapdoor is called a trapdoor one-way function.

The first two functions of Example C.1 do not have obvious trapdoors and so cannot be straightaway used for designing cryptosystems. The third function (RSA encryption) has the requisite trapdoor, namely, the decryption exponent d satisfying ed ≡ 1 (mod φ(n)).

The hunt for a theoretical foundation does not end here. It begins. Most part of complexity theory deals with worst-case complexities of problems, rather than their average or expected complexities. A one-way function, even if existent, may be difficult to invert for only few instances, whereas cryptography demands the inversion problem to be difficult for most instances. A function meeting even this cryptographic demand need not be suitable, since there may be reductions to map hard instances to easy instances. Moreover, the trapdoors themselves may inject vulnerabilities and prepare room for quick attacks.

There still remains a long way to go!

Exercise Set C.3

C.4Let f : Σ* → Σ* be a function with the property that f(f(α)) = f(α) for every . Argue that f is not a one-way function.
C.5Design unambiguous polynomial time Turing machines for computing the inverses of the functions described in Example C.1.
C.6Show that if there exists a bijective one-way function, then NP ∩ coNP ≠ P. [H]

D. Hints to Selected Exercises

The greatest thing in family life is to take a hint when a hint is intended and not to take a hint when a hint isn’t intended.

—Robert Frost

Teachers open the door, but you must enter by yourself.

—Chinese Proverb

Imagination grows by exercise, and contrary to common belief, is more powerful in the mature than in the young.

—W. Somerset Maugham

2.11 (a)Apply Theorem 2.3 to the restriction to H of the canonical homomorphism GG/K.
2.11 (b)Apply Theorem 2.3 to the canonical homomorphism G/HG/K, aHaK, .
2.14 (c)Consider the canonical surjection GG/H.
2.17 (a)Let ij and . Then ord g divides both and and so is equal to 1, that is, g = e. Now let hi, and with . But then . Thus #(HiHj) = (#Hi)(#Hj). Generalize this argument to show that #(H1 · · · Hr) = n.
2.18First consider the special case #G = pr for some and . For each , the order ordG g is of the form psg for some sgr. Let s be the maximum of the values sg, . Take any element with ordG h = ps. Then e, h, . . . , hps–1 are all the elements x that satisfy xps = e. But by the choice of s every element satisfies xps = e. Hence we must have s = r. This proves the assertion for the special case. For the general case, use this special case in conjunction with Exercise 2.17.
2.19 (b)Show that , (h1, . . . , hr) ↦ h1 . . . hr, is a group isomorphism.
2.23Use Zorn’s lemma.
2.24 (c)Let be the intersection of all prime ideals of R. First show that . To prove the reverse inclusion take and consider the set S of all non-unit ideals of R such that for all . If f is a non-unit, the set S is non-empty and by Zorn’s lemma has a maximal element, say . Show that is a prime ideal of R.
2.25For , the map RR, bab, is injective and hence surjective by Exercise 2.4.
2.30Apply the isomorphism theorem to the canonical surjection , .
2.33[(1)⇒(2)] Let be an ascending chain of ideals of R. Consider the ideal which is finitely generated by hypothesis.

[(3)⇒(1)] Let be an ideal of R. Consider the set of all finitely generated ideals of R contained in .

2.36Use the pigeon-hole principle: If there are n + 1 pigeons in n holes, then there exists at least one hole containing more than one pigeons.
2.37Consider the integer satisfying 2tn < 2t+1.
2.39 (e)12 ≡ (n – 1)2 (mod n).
2.39 (f)Apply Wilson’s theorem.
2.40Use Fermat’s little theorem.
2.41Use Wilson’s theorem or Euler’s criterion.
2.45Reduce to the case y2 ≡ α (mod p).
2.49 (a)Consider the canonical group homomorphism and the fact that a surjective group homomorphism from a cyclic group G onto G′ implies that G′ is cyclic.
2.49 (b)Let be a primitive element modulo p. The residue class of a in has order k(p – 1) for some . Show that the order of b := p + 1 modulo pe is pe–1. So the order of akb modulo pe is pe–1(p – 1) = φ(pe).
2.50Use the Chinese remainder theorem in conjunction with Exercises 2.20 and 2.49.
2.53Take . The interpolating polynomial is . Use Exercise 2.52 to establish the uniqueness.
2.56 (b) is irreducible in if and only if f(X + 1) is irreducible in .
2.58Use the fundamental theorem of algebra.
2.63Consider the set of all linearly independent subsets of V that contain T. Show that every chain in has an upper bound in . By Zorn’s Lemma, there exists a maximal element . Show that S generates V.
2.64 (b)Use Exercise 2.63.
2.68Let p1, . . . , pn be n distinct primes. Take and ai := a/pi for i = 1, . . . , n.
2.72 (a)If N is the -submodule of generated by ai/bi, i = 1, . . . , n, with gcd(ai, bi) = 1, then for any prime p that does not divide b1 · · · bn we have 1/pN.
2.72 (b)Any two distinct elements of are linearly dependent over . Now use Exercise 2.69.
2.74 (b)Let the conjugates of over F be α1 = α, α2, . . . , αn. Since is injective, it follows from (a) that makes a permutation of α1, . . . , αn. So is surjective.
2.75 (a)Use Exercise 2.61.
2.76 (b)The if part follows from Exercise 2.61. For proving the only if part, take . If the polynomial f(X) := Xpa splits over F, we are done. So suppose that there exists an irreducible divisor of f(X) of degree ≥ 2. By the separability of F, there exist two distinct roots α, β of g(X). Let K := F (α, β). Show that the Frobenius map , , is an endomorphism of K. Also there exists a field isomorphism τ : F (α) → F (β) which fixes F element-wise and takes α ↦ β. But then . Since any field homomorphism is injective, α equals β, a contradiction. Thus no g(X) chosen as above can exist.
2.77 (a)Let be an irreducible polynomial with g(α) = 0 for some . Let β be another root of g. We show that . By Lemma 2.5, there is an isomorphism μ : F(α) → F(β). Clearly, K is the splitting field of f over F(α). Let K′ be the splitting field of μ*(f) over F (β). By Proposition 2.33, KK′. If are the roots of f, then K′ ≅ F (β, γ1, . . . , γd) = K(β). But then KK(β).
2.78 (a)Consider transcendental numbers.
2.78 (b)Let . For , we have , implying that for a, with ab. Now assume for some . Choose a rational number b with . Then , a contradiction. Thus . Similarly .
2.80Use the binomial theorem and induction on n.
2.82Follow the proof of Theorem 2.37.
2.90Example 2.18.
2.91 (b)By the fundamental theorem of Galois theory, # . Now show that are distinct -automorphisms of .
2.92 (a)Assume r > 1. We have the extensions , where is the splitting field of f over and hence over . Consider the minimal polynomial of a root of f over . Conversely, let f be reducible over . Choose an irreducible factor of f with deg h = s < d. Now h has one (and hence all) roots in and, therefore, d|sm.
2.93Use Corollary 2.18.
2.98In each case, the defining polynomial is quadratic in Y (and with coefficients in K[X]). If this polynomial admits a non-trivial factorization, one can reach a contradiction by considering the degrees of X in the coefficients of Y1 and Y0.
2.103For simplicity, consider the case char K ≠ 2, 3. Show that the curves Y2 + Y = X3 and Y2 = X3 + X have j-invariants 0 and 1728 respectively. Finally, if , 1728, then the curve has j-invariant . One must also argue that these are actually elliptic curves, that is, have non-zero discriminants.
2.111Use Theorem 2.51.
2.112 (a)Pair a point with its opposite. This pairing fails for points of orders 1 and 2.
2.112 (c)Consider the elliptic curve E : Y2 = X3 + 3 over . We have , whereas X3 + 3 is irreducible modulo 13.
2.113 (a)Every element of has a unique square root.
2.115 (a)Use Theorem 2.49 or Exercise 2.17.
2.115 (b)Use Theorem 2.50.
2.115 (c)The trace of Frobenius at q is 0 in this case. Now, use Theorem 2.50.
2.123Factor N(G) in .
2.127Let . For each i, write , . But then det , where , δij being the Kronecker delta.
2.128 (b)Use Part (a) and Exercise 2.126(c).
2.128 (c)Let . By Exercise 2.130, is integral over . Let be the ideal generated by in and let and be the ideals of generated respectively by and . Now, use Part (b).
2.133 (b)In a PID, non-zero prime ideals are maximal.
2.137 (a)Since and are maximal, we have , that is, a1 + a2 = 1 for some and . Now use the fact that (a1 + a2)e1 + e2 = 1.
2.137 (b)Use CRT.
2.138 (a)Since is invertible, for some fractional ideal .
2.140 (a)For , let constitute a complete residue system of modulo . Then also form a complete residue system of modulo .
2.142 (d)Take in Part (b).
2.143 (a)Reduce modulo 4.
2.143 (c)Let divide this gcd. Then divides 2y and . Take norms.
2.144 (b)Look at the expansion of a – 1 in base p. More precisely, let a < pN for some . Then –a = (pNa) – pN = [(pN – 1) – (a – 1)] – pN.
2.152 (c)First show that .
2.153Use unique factorization of rationals.
2.154Show by induction on n that pn+1 divides apn+1apn in for all .
2.161There exists an irreducible polynomial in of every degree .
3.7The implication is obvious. For the reverse implication, use Proposition 2.5.
3.18 (b)Consider the binary expansion of m.
3.19if n is a pseudoprime to base a and not a pseudoprime to base b, then n is not a pseudoprime to base ab.
3.20 (a)If p2|n for some , take with ordn(a) = p. If n is square-free, consider a prime divisor p of n and take with and a ≡ 1 (mod n/p).
3.20 (b)if n is an Euler pseudoprime to base a and not an Euler pseudoprime to base b, then n is not an Euler pseudoprime to base ab.
3.21 (a)Let be the prime factorization of n with r and each αi in . Then, . For odd pi, the group is cyclic of order and hence contains an element of order pi – 1.
3.21 (b)ordn(–1) = 2.
3.21 (c)Let vp(n) ≥ 2 for some odd prime p. Construct an element with ordn(a) = p.
3.28Proceed by induction on i = 1, . . . , r. For 1 ≤ ir, define νi := n1 · · · ni and let be a solution of the congruences biaj (mod nj) for j = 1, . . . , i. If i < r, use the combining formula given in Section 2.5 to find such that bi+1bi (mod νi) and bi+1ai+1 (mod ni+1).
3.31Apply Newton’s iteration to compute a zero of x2n.
3.32 (a)Apply Newton’s iteration to compute a zero of xkn.
3.34 (b)The updating d(X) := d(X) – Xisb(X) needs to consider only the non-zero words of b.
3.36 (b)First consider b = 0 and note that the roots of X(q–1)/2 – 1 (resp. X(q–1)/2 + 1) are all the quadratic residues (resp. non-residues) of .
3.36 (c)First consider b = 0.
3.40For , we have ord(a)|m and for each i = 1, . . . , r the multiplicity vpi (ord(a)) is the smallest of the non-negative integers k satisfying .
3.41 (a)Use the CRT.
3.43 (a)Use the CRT and the fact that for an odd prime r ≡ 3 (mod 4).
4.1 (a)Using the CRT, reduce to the case that n is prime. Then is bijective ⇔ the restriction is bijective. Now, if gcd(a, φ(n)) = 1, the inverse of is given by , where ab ≡ 1 (mod φ(n)). On the other hand, if q is a prime divisor of gcd(a, φ(n)), choose an element with ord(y) = q. But then ya ≡ 1 (mod n), that is, is not injective. This exercise provides the foundation for the RSA cryptosystems.
4.1 (b)In view of the CRT, reduce to the case n = pα for and α > 1. Then (pα–1)a ≡ 0 (mod n).
4.6Consider the integral .
4.9Use the CRT and lifting.
4.10For proving , let n be an odd composite integer, choose a random and compute a square root x of y2 modulo n. By Exercise 4.9, the probability that x ≡ ±y (mod n) is at most 1/2.
4.12 (d)Eliminate a from T (a, b, c) using a + b + c = 0. For each fixed c, allow b to vary and use a sieve to find out all the values of b for which T (a, b, c) is smooth for the fixed c.
4.13You may use the prime number theorem and the fact that the sum of the reciprocals of the first t primes asymptotically approaches ln ln t.
4.15If a < a1 or a > am, then no i exists. So assume that a1aam and let d := ⌊(1 + m)/2⌋. If a = ad, return d, else if a < ad, recursively search a among the elements a1, . . . , ad–1, and if a > ad, recursively search a among the elements ad+1, . . . , am.
4.16 (a)Use Lagrange’s interpolation formula (Exercise 2.53).
4.18 (a)One may precompute the values σi := p rem qi, i = 1, . . . , t. Note that qi|(gα + kp) if and only if ρk,i = 0.
4.19 (a)Use the approximation T (c1, c2) ≈ (c1 + c2)H.
4.21 (c)T (a, b, c) = –b2c(x + cy)b + (zc2x).
4.21 (d)Imitate the second stage of the LSM.
4.23Let the factor base consist of all irreducible polynomials over of degrees ≤ m together with the polynomials of the form Xk + h(X), , deg hm. The optimal running time of this algorithm corresponds to .
4.24 (b) is square-free.
4.24 (c)Use the fact Xm – 1 = (Xm/pvp(m) – 1)pvp(m).
4.24 (d)Theorem 2.39.
4.25 (a)Look at the roots of the polynomials on the two sides.
4.25 (c)If ord ω = m, then ord(–ω) = 2m.
4.25 (d)ω, ωq, . . . , ωql–1 are all the roots of the minimal polynomial of ω over .
4.26 (b)Use the Mordell–Weil theorem.
4.26 (c)Use Theorem 4.2.
5.2 (a)Solve the simultaneous congruences xci (mod ni), i = 1, . . . , e, and then take the integer e-th root of the solution x, 1 ≤ xn1 · · · ne.
5.2 (b)Append (different) pseudorandom bit strings to m before encryption. This process is often referred to as salting.
5.3 (a)In view of the Chinese remainder theorem, reduce to the case n = pr for some and .
5.4ue1 + ve2 = 1 for some u, .
5.6If the same session key is used to generate the ciphertext pairs (r1, s1) and (r2, s2) on two plaintext messages m1 and m2, then m1/m2 = s1/s2.
5.7 (c)Let x = (xl–1 . . . x1x0)2. Define x′ := (xl–1 . . . x2x1)2 and y′ := gx′ (mod p). Then, yy′2gx0 (mod p). Since x0 is easily computable, y′ can be obtained by obtaining a square root of y modulo p. Argue that a call of the oracle helps us choose the correct square root y′ of y. Now, use recursion.
5.8Let g′ be any randomly chosen generator of , where q := ph. One computes for i = 0, 1, . . . , p – 1. We then have the equality of the sets

modulo q – 1, where l := indg′ g. But then for each i we have a (yet unknown) j such that . Show that trying all possibilities for i and j one can effectively recover l and hence g = g′l and hence π.

5.9Let g′, and l be as in Exercise 5.8. Now, we have the equality of the sets

modulo q – 1.

5.11 (mod β) are polynomials with small coefficients.
5.15 (a)If Alice generates the signatures (M1, s1) and (M2, s2) on two messages M1 and M2, then her signature on a message M with H(M) ≡ H(M1)H(M2) (mod n) is s1s2 (mod n). Thus, without knowing the private key of Alice, an intruder can generate a valid signature (M, s1s2) of Alice, provided that such an M can be computed. Of course, here the intruder has little control over the message M. The PKC standards form RSA Laboratories add some redundancy to the hash function output before signing. The product of two hash values with redundancy is, in general, expected not to have the redundancy. This increases the security of the scheme against existential forgeries beyond that provided by the first pre-image resistance of the underlying hash function.
5.15 (b)For any , a valid signature is (M, s), where H(M) ≡ s2 (mod n).
5.15 (c)Choose random integers u, v with gcd(v, n) = 1 and take d′ := u + dv. Of course, d and hence d′ are unknown to Carol, but she can compute s = gd′ = gu(gd)v and t ≡ –H(s)v–1 (mod n). But then (M, s, t) is a valid ElGamal signature on a message M for which H(M) ≡ tu (mod n).
5.16Obviously, c itself could be a possible choice, but that is not random and Bob might refuse to sign c. Carol should hide c by cre (mod n) for some randomly chosen r known to her.
5.23 (a) by the CRT.
5.25 (a)Replace the random challenge of the verifier by the hash value of the string obtained by concatenating the message to be signed with the witness.
5.26 (d)Bob finds a random b′ with and sends a := (b′)2 (mod n) to Alice. But then Alice’s response b yields a non-trivial factor gcd(bb′, n) of n.
7.5 (mod n) and mse (mod n).
7.9 (a)Use Exercise 2.44(b).
7.9 (c)Again use Exercise 2.44(b).
7.9 (d)Use Part (c) in conjunction with the CRT, and separately consider the three cases v2(p–1) = v2(q – 1), v2(p – 1) > v2(q – 1) and v2(p – 1) < v2(q – 1).
A.2 for all X, J. One does not have to look at the S-boxes for proving this.
A.9 (c)For i = 0, 1, 2, 3, 4Nr, 4Nr + 1, 4Nr + 2, 4Nr + 3, take . For other values of i, take .
A.14 (b)Let DL(X) := XdCL(1/X) = a0 + a1X + a2X2 + · · · + ad–1Xd–1 + Xd. Consider the -algebra , where x := X + 〈DL(X)〉. The -linear transformation λx : AA defined by g(x) ↦ xg(x) has the matrix ΔL with respect to the polynomial basis (1, x, . . . , xd–1). If is the minimal polynomial of λx, then [fx)](1) = f(x) = 0. Now, use the fact that 1, x, . . . , xd–1 are linearly independent over .
A.16 (b)[only if] Take σ ≠ 00 · · · 01. Since σ is non-zero, si = 1 for some . Construct an LFSR with d – 1 stages initialized to s0s1 · · · sd–2 to generate σ.
A.19Suppose that we want to compute a second pre-image for H2(x). If , any is a second pre-image for H2(x). If , computing a second pre-image for H2(x) is equivalent to computing a second pre-image for H(x). The density of the (finite) set S is 0 in the (infinite) set of all bit strings. Thus, H2 is second pre-image resistant. On the other hand, for any two distinct x, we have a collision (x, x′) for H2.
A.20Collision resistance of H implies that of H3. On the other hand, for a positive fraction (half) of the (n + 1)-bit strings y, it is easy to compute a pre-image of y under H3.
A.21If y is a square root of a modulo m, then so is my too.
A.22Use the birthday paradox (Exercise 2.172).
A.23 (d)Let L := F1(L′) and R := F1(R′) with both R and R′ non-zero. Then, F1(LR) = F2(L′R′).
A.25Let h(i) denote the column vector of dimension 160 having the bits of H(i) as its elements and m(i) the column vector of dimension 512 + 160 = 672 having the bits of M(i) and of H(i) as its elements. Show that the modified design of SHA-1 leads to the relation h(i)Am(i–1) + c (mod 2) for some constant 160 × 672 matrix A over and for some constant vector c. So what then?
C.6For α, , call α ≤ β if and only if |α| < |β| or |α| = |β| and α is lexicographically smaller than β. This ≤ produces a well-ordering of Σ*. For a one-way function f, look at the language for some with γ ≤ β}.

 

References

If you steal from one author, it’s plagiarism; if you steal from many, it’s research.

—Wilson Mizner

Literature is the question minus the answer.

—Roland Barthes

Everything that can be invented, has been invented.

—Charles H. Duell, 1899

[1] Adkins, W. A. and S. H. Weintraub (1992). Algebra: An Approach via Module Theory. Graduate Texts in Mathematics, 136. New York: Springer.

[2] Adleman, L. M., J. DeMarrais and M.-D. A. Huang (1994). “A Subexponential Algorithm for Discrete Logarithms over the Rational Subgroup of the Jacobians of Large Genus Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-I, Lecture Notes in Computer Science, 877. pp. 28–40. Berlin/Heidelberg: Springer.

[3] Adleman, L. M. and M.-D. A. Huang (1992). “Primality Testing and Two Dimensional Abelian Varieties over Finite Fields”, Lecture Notes in Mathematics, 1512. Berlin: Springer.

[4] Adleman, L. M., C. Pomerance and R. S. Rumely (1983). “On Distinguishing Prime Numbers from Composite Numbers”, Annals of Mathematics, 117: 173–206.

[5] Agarwal, M., N. Kayal and N. Saxena (2002), “Primes Is in P” [online document]. Available at http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf (October 2008).

[6] * Ahlfors, L. V. (1966). Complex Analysis. New York: McGraw-Hill.

[7] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1974). The Designs and Analysis of Algorithms. Reading, Massachusetts: Addison-Wesley.

[8] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1983). Data Structues and Algorithms. Reading, Massachusetts: Addison-Wesley.

[9] Aigner, M. and E. Oswald (2007), “Power Analysis Tutorial” [online document]. Available at http://www.iaik.tugraz.at/content/research/implementation_attacks/introduction_to_impa/dpa_tutorial.pdf (October 2008).

[10] Akkar, M.-L., R. Bevan, P. Dischamp and D. Moyart (2000). “Power Analysis, What Is Now Possible”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 489–502. Berlin/Heidelberg: Springer.

[11] Anderson, R. and M. Kuhn (1997). “Low Cost Attacks on Tamper Resistant Devices”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 125–136. Berlin/Heidelberg: Springer.

[12] * Apostol, T. M. (1976). Introduction to Analytic Number Theory. Undergraduate Texts in Mathematics. New York: Springer.

[13] Arnold, V. I. (1999). “Polymathematics: Is Mathematics a Single Science or a Set of Arts?”, in V. Arnold, M. Atiyah, P. Lax and B. Mazur (eds.), Mathematics: Frontiers and Perspectives, pp. 403–416. Providence, Rhode Island: American Mathematical Society.

[14] Atiyah, M. F. and I. G. MacDonald (1969). Introduction to Commutative Algebra. Reading, Massachusetts: Addison-Wesley.

[15] Aumüller, C., P. Bier, W. Fischer, P. Hofreiter and J.-P. Seifert (2002), “Fault Attacks on RSA with CRT: Concrete Results and Practical Countermeasures” [online document]. Available at http://eprint.iacr.org/2002/073 (October 2008).

[16] Balasubramanian, R. and N. Koblitz (1998). “The Improbability that an Elliptic Curve has Subexponential Discrete Log Problem under the Menezes-Okamoto Vanstone Algorithm”, Journal of Cryptology, 11: 141–145.

[17] Bao, F., R. H. Deng, Y. Han, A. B. Jeng, A. D. Narasimhalu, T.-H. Ngair (1997). “Breaking Public Key Cryptosystems on Tamper Resistant Devices in the Presence of Transient Faults”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 115–124. Berlin/Heidelberg: Springer.

[18] Bellare, M. and P. Rogaway (1995). “Optimal Asymmetric Encryption—How to Encrypt with RSA”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 92–111. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/oaep.html (October 2008).

[19] Bellare, M. and P. Rogaway (1996). “The Exact Security of Digital Signatures: How to Sign with RSA and Rabin”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 399–416. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/exactsigs.html (October 2008).

[20] Bennett, C. H. and G. Brassard (1984). “Quantum Cryptography: Public Key Distribution and Coin Tossing”, pp. 175–179. Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing, Bangalore, India, December.

[21] Berlekamp, E. R. (1968). Algebraic Coding Theory. New York: McGraw-Hill.

[22] Biham, E. and A. Shamir (1997). “Differential Fault Analysis of Secret Key Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 513–528. Berlin/Heidelberg: Springer.

[23] Blake, I. F., R. Fuji-Hara, R. C. Mullin and S. A. Vanstone (1984). “Computing Logarithms in Finite Fields of Characteristic Two”, SIAM Journal of Algebraic and Discrete Methods, 5: 276–285.

[24] Blake, I. F., G. Seroussi and N. P. Smart (1999). Elliptic Curves in Cryptography. Cambridge: Cambridge University Press.

[25] Blom, R. (1985). “An Optimal Class of Symmetric Key Generation Systems”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 335–338. Berlin/Heidelberg: Springer.

[26] Blum, L., M. Blum, and M. Shub (1986). “A Simple Unpredictable Pseudo-Random Number Generator”, SIAM Journal on Computing, 15: 364–383.

[27] Blum, M. and S. Goldwasser (1985). “An Efficient Probabilistic Public Key Encryption Scheme Which Hides All Partial Information”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 289–299. Berlin/Heidelberg: Springer.

[28] Blundo, C., A. De Santis, A. Herzberg, S. Kutten, U. Vaccaro and M. Yung (1993). “Perfectly-Secure Key Distribution for Dynamic Conferences”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 471–486. Berlin/Heidelberg: Springer.

[29] Boneh, D. (1999). “Twenty Years of Attacks on the RSA Cryptosystem”, Notices of the American Mathematical Society, 46 (2): 203–213.

[30] Boneh, D., R. A. DeMillo and R. J. Lipton (1997). “On the Importance of Checking Cryptographic Protocols for Faults”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 37–51. Berlin/Heidelberg: Springer.

[31] Boneh, D., R. A. DeMillo and R. J. Lipton (2001). “On the Importance of Eliminating Errors in Cryptographic Computations”, Journal of Cryptology, 14 (2): 101–119.

[32] Boneh, D. and G. Durfee (1999). “Cryptanalysis of RSA with Private Key d Less Than N0.292”, Advances in Cryptology—EUROCRYPT ’99, Lecture Notes in Computer Science, 1592. pp. 1–11. Berlin/Heidelberg: Springer.

[33] Boneh, D., G. Durfee and Y. Frankel (1998). “Exposing an RSA Private Key Given a Small Fraction of Its Bits”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 25–34. Berlin/Heidelberg: Springer.

[34] Boneh, D. and M. K. Franklin (2001). “Identity-based Encryption from the Weil Pairing”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 213–229. Berlin/Heidelberg: Springer.

[35] Boneh, D. and M. K. Franklin (2003). “Identity-based Encryption from the Weil Pairing”, SIAM Journal of Computing, (32) 3: 586–615.

[36] Bressoud, D. M. (1989). Factorization and Primality Testing. Undergraduate Texts in Mathematics. New York: Springer.

[37] * Buchmann, J. A. (2004). Introduction to Cryptography. Undergraduate Texts in Mathematics. New York: Springer.

[38] Buchmann, J. A. et al. (2004), “The Number Field Cryptography Project” [online document]. Available at http://www.informatik.tu-darmstadt.de/TI/Forschung/nfc.html (October 2008).

[39] Buchmann, J. A. and S. Hamdy (2001). “A Survey on IQ Cryptography”. Technical report TI-4/01, TU Darmstadt, Fachbereich Informatik.

[40] Buchmann, J. A. and D. Weber (2000). “Discrete Logarithms: Recent Progress”, in J. Buchmann, T. Hoeholdt, H. Stichtenoth and H. Tapia-Recillas (eds.), Coding Theory, Cryptography and Related Areas, pp. 42–56. Proceedings of an International Conference on Coding Theory, Cryptography and Related Areas, Guanajuato, Mexico, April 1998.

[41] Buhler, J., H. W. Lenstra and C. Pomerance (1993). “Factoring Integers with the Number Field Sieve”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 50–94. Berlin: Springer.

[42] * Burton, D. M. (1998). Elementary Number Theory, 4th ed. New York: McGraw-Hill.

[43] Cantor, D. G. (1994). “On the Analogue of Division Polynomials for Hyperelliptic Curves”, Journal für die reine und angewandte Mathematik, 447: 91–145.

[44] Chan, H., A. Perrig and D. Song (2003). “Random Key Predistribution Schemes for Sensor Networks”, pp. 197–213. Proeedings of the 24th IEEE Symposium on Research in Security and Privacy, Berkeley, California, 11–14 May.

[45] Chari, S., C. S. Jutla, J. R. Rao, and P. Rohatgi (1999). “Towards Sound Approaches to Counteract Power-Analysis Attacks”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 398–412. Berlin/Heidelberg: Springer.

[46] Charlap, L. S. and R. Coley (1990). “An Elementary Introduction to Elliptic Curves II”, CCR Expository Report 34.

[47] Charlap, L. S. and D. P. Robbins (1988). “An Elementary Introduction to Elliptic Curves”, CRD Expository Report 31.

[48] Chaum, D. (1983). “Blind Signatures for Untraceable Payments”, Advances in Cryptology—CRYPTO ’82. pp. 199–203. New York: Plenum Press.

[49] Chaum, D. (1985). “Security Without Identification: Transaction System to Make Big Brother Obsolete”, Communications of the ACM, 28 (10): 1030–1044.

[50] Chaum, D. (1989). “Privacy Protected Payments: Unconditional Payer and/or Payee Untraceability”, Smart Card 2000: The Future of IC Cards, pp. 69–93. Amsterdam: North-Holland.

[51] Chaum, D. (1990). “Zero-Knowledge Undeniable Signatures”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 473. pp. 458–464. Berlin/Heidelberg: Springer.

[52] Chaum, D. and H. van Antwerpen (1989). “Undeniable Signatures”, Advances in Cryptology—CRYPTO ’89, Lecture Notes in Computer Science, 435. pp. 212–217. Berlin/Heidelberg: Springer.

[53] Chaum, D., E. van Heijst and B. Pfitzmann (1991). “Cryptographically Strong Undeniable Signatures, Unconditionally Secure for the Signer”, Advances in Cryptology—CRYPTO ’91, Lecture Notes in Computer Science, 576. pp. 470–484. Berlin/Heidelberg: Springer.

[54] Chor, B. and R. L. Rivest (1988). “A Knapsack Type Cryptosystem Based on Arithmetic in Finite Fields”, IEEE Transactions on Information Theory, 34: 901–909.

[55] Clavier, C., J.-S. Coron and N. Dabbous (2000). “Differential Power Analysis in the Presence of Hardware Countermeasures”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 252–263. Berlin/Heidelberg: Springer.

[56] Cohen, H. (1993). A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics, 138. New York: Springer.

[57] Coppersmith, D. (1984). “Fast Evaluation of Logarithms in Fields of Characteristic Two”, IEEE Transactions on Information Theory, 30: 587–594.

[58] Coppersmith, D. (1994). “Solving Homogeneous Equations over GF[2] via Block Wiedemann Algorithm”, Mathematics of Computation, 62: 333–350.

[59] Coppersmith, D., A. M. Odlyzko and R. Schroeppel (1986). “Discrete Logarithms in GF (p)”, Algorithmica, 1: 1–15.

[60] Coppersmith, D. and S. Winograd (1982). “On the Asymptotic Complexity of Matrix Multiplication”, SIAM Journal of Computing, 11 (3): 472–492.

[61] * Cormen, T. H., C. E. Lieserson, R. L. Rivest and C. Stein (2001). Introduction to Algorithms, 2nd ed. Cambridge, Massachusetts: MIT Press.

[62] Coron, J.-S. (1999). “Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1965. pp. 292–302. Berlin/Heidelberg: Springer.

[63] Coron, J.-S., L. Goubin (2000). “On Boolean and Arithmetic Masking Against Differential Power Analysis”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 231–237. Berlin/Heidelberg: Springer.

[64] Coster, M. J., A. Joux, B. A. LaMacchia, A. M. Odlyzko, C. P. Schnorr and J. Stern (1992). “Improved Low-Density Subset Sum Algorithms”, Computational Complexity, 2: 111–128.

[65] Coster, M. J., B. A. LaMacchia, A. M. Odlyzko and C. P. Schnorr (1991). “An Improved Low-Density Subset Sum Algorithm”, Advances in Cryptology—EUROCRYPT ’91, Lecture Notes in Computer Science, 547. pp. 54–67. Berlin/Heidelberg: Springer.

[66] Courtois, N. (2003). “Fast Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 177–194. Berlin/Heidelberg: Springer.

[67] Courtois, N. and W. Meier (2003). “Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 345–359. Berlin/Heidelberg: Springer.

[68] Courtois, N. and J. Pieprzyk (2003). “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 267–287. Berlin/Heidelberg: Springer.

[69] Crandall, R. and C. Pomerance (2001). Prime Numbers: A Computational Perspective. New York: Springer.

[70] Crépeau, C. and A. Slakmon (2003). “Simple Backdoors for RSA Key Generation”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 403–416. Berlin/Heidelberg: Springer.

[71] Daemen, J. and V. Rijmen (2002). The Design of Rijndael: AES—The Advanced Encryption Standard. New York: Springer.

[72] Das, A. (1999). Galois Field Computations: Implementation of a Library and a Study of the Discrete Logarithm Problem [dissertation]. Bangalore, India: Indian Institute of Science.

[73] Das, A. and C. E. Veni Madhavan (1999). “Performance Comparison of Linear Sieve and Cubic Sieve Algorithms for Discrete Logarithms over Prime Fields”, Algorithms and Computation, ISAAC ’99, Lecture Notes in Computer Science, 1741. pp. 295–306. Berlin/Heidelberg: Springer.

[74] * Delfs, H. and H. Knebl (2007). Introduction to Cryptography: Principles and Applications, 2nd ed. Berlin and New York: Springer.

[75] Deutsch, D. (1985). “Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer”. Proceedings of the Royal Society of London, Series A, 400. pp. 97–117.

[76] Deutsch, D. (1998). The Fabric of Reality: The Science of Parallel Universes—and Its Implications. London: Penguin.

[77] Dhem, J.-F., F. Koeune, P.-A. Leroux, P. Mestré, J.-J. Quisquater and J.-L. Willems (2000). “A Practical Implementation of the Timing Attack”, in J.-J. Quisquater and B. Schneier (eds.), Smart Card: Research and Applications, Lecture Notes in Computer Science, 1820. Proceedings of the Third Working Conference on Smart Card Research and Advanced Applications—CARDIS ’98, Louvain-la-Neuve, Belgium, 14–16 September 1998. Springer.

[78] Diffie, W. and M. Hellman (1976). “New Directions in Cryptography”, IEEE Transactions on Information Theory, 22: 644–654.

[79] Du, W., J. Deng, Y. S. Han and P. K. Varshney (2003). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 42–51. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, 27–30 October.

[80] Du, W., J. Deng, Y. S. Han, S. Chen and P. K. Varshney (2004). “A Key Management Scheme for Wireless Sensor Networks Using Deployment Knowledge”. Proceedings of IEEE INFOCOM 2004, Hong Kong, 7–11 March.

[81] * Dummit, D. and R. Foote (2004). Abstract Algebra, 3rd ed. Somerset, New Jersey: John Wiley & Sons.

[82] Durfee, G. and P. Q. Nguyen (2000). “Cryptanalysis of the RSA Schemes with Short Secret Exponent from Asiacrypt ’99”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 30–44. Berlin/Heidelberg: Springer.

[83] Dusart, P. (1999). “The kth Prime Is Greater than k(ln k+ln ln k–1) for k > 2”, Mathematics of Computation, 68: 411–415.

[84] ElGamal, T. (1985). “A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms”, IEEE Transactions on Information Theory, 31: 469–472.

[85] Elkies, N. D. (1998). “Elliptic and Modular Curves over Finite Fields and Related Computational Issues”, AMS/IP Studies in Advanced Mathematics, 7: 21–76.

[86] Enge, A. (1999). “Computing Discrete Logarithms in High-Genus Hyperelliptic Jacobians in Provably Subexponential Time”. Technical report CORR 99-04, University of Waterloo, Canada.

[87] Enge, A. and P. Gaudry (2002). “A General Framework for Subexponential Discrete Logarithm Algorithms”, Acta Arithmetica, 102 (1): 83–103.

[88] Eschenauer, L. and V. D. Gligor (2002). “A Key-Management Scheme for Distributed Sensor Networks”. Proceedings of the 9th ACM Conference on Computer and Communication Security, pp. 41–47. Washington D.C., USA, 18–22 November.

[89] * Esmonde, J. and M. Ram Murty (1999). Problems in Algebraic Number Theory. Graduate Texts in Mathematics, 190. New York: Springer.

[90] Fiat, A. and A. Shamir (1987). “How to Prove Yourself: Practical Solutions to Identification and Signature Problems”, Advances in Cryptology—CRYPTO ’86, Lecture Notes in Computer Science, 263. pp. 186–194. Berlin/Heidelberg: Springer.

[91] Feige, U., A. Fiat, and A. Shamir (1988). “Zero-Knowledge Proofs of Identity”, Journal of Cryptology, 1: 77–94.

[92] * Feller, W. (1966). Introduction to Probability Theory and Its Applications, 3rd ed. New York: John Wiley & Sons.

[93] Ferguson, N., J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and D. Whiting (2000). “Improved Cryptanalysis of Rijndael”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 213–230. Berlin/Heidelberg: Springer.

[94] Fouquet, M., P. Gaudry and R. Harley (2000). “An Extension of Satoh’s Algorithm and Its Implementation”, Journal of Ramanujan Mathematical Society, 15: 281–318.

[95] Fouquet, M., P. Gaudry and R. Harley (2001). “Finding Secure Curves with the Satoh-FGH Algorithm and an Early-Abort Strategy”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. Berlin/Heidelberg: Springer.

[96] * Fraleigh, J. B. (1998). A First Course in Abstract Algebra, 6th ed. Reading, Massachusetts: Addison-Wesley.

[97] Fujisaki, E., T. Kobayashi, H. Morita, H. Oguro, T. Okamoto, S. Okazaki, D. Pointcheval and S. Uchiyama (1999). “EPOC: Efficient Probabilistic Public-Key Encryption”, contribution to IEEE P1363a.

[98] Fujisaki, E., T. Okamoto, D. Pointcheval, J. Stern (2001). “RSA-OAEP is Secure under the RSA Assumption”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 260–274. Berlin/Heidelberg: Springer.

[99] Fulton, W. (1969). Algebraic Curves. Mathematics Lecture Notes Series. New York: W. A. Benjamin.

[100] Galbraith, S. D. (2003). “Weil Descent of Jacobians”, Discrete Applied Mathematics, 128 (1): 165–180.

[101] Galbraith, S. D., F. Hess and N. P. Smart (2002). “Extending the GHS Weil Descent Attack”, Advances in Cryptology—EUROCRYPT 2002, Lecture Notes in Computer Science, 2332. pp. 29–44. Berlin/Heidelberg: Springer.

[102] Galbraith, S. D., W. Mao, and K. G. Paterson (2002). “RSA-based Undeniable Signatures for General Moduli”, Topics in Cryptology—CT-RSA 2002, Lecture Notes in Computer Science, 2271. pp. 200–217. Berlin/Heidelberg: Springer.

[103] Gathen, J. von zur and J. Gerhard (1999). Modern Computer Algebra. Cambridge: Cambridge University Press.

[104] Gathen, J. von zur and V. Shoup (1992). “Computing Frobenius Maps and Factoring Polynomials”, pp. 97–105. Proceedings of the 24th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada.

[105] Gaudry, P. (2000). “An Algorithm for Solving the Discrete Log Problem on Hyperelliptic Curves”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 19–34. Berlin/Heidelberg: Springer.

[106] Gaudry, P. and R. Harley (2000). “Counting Points on Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-IV, Lecture Notes in Computer Science, 1838. pp. 313–332. Berlin/Heidelberg: Springer.

[107] Gaudry, P., F. Hess and N. P. Smart (2002). “Constructive and Destructive Facets of Weil Descent on Elliptic Curves”, Journal of Cryptology, 15 (1): 19–46.

[108] Geddes, K. O., S. R. Czapor and G. Labahn (1992). Algorithms for Computer Algebra. Boston: Kluwer Academic Publishers.

[109] Gennaro, R., H. Krawczyk and T. Rabin (2000). “RSA-based Undeniable Signatures”, Journal of Cryptology, 13 (4): 397–416.

[110] Gentry, C., J. Jonsson, M. Szydlo and J. Stern (2001). “Cryptanalysis of the NTRU Signature Scheme (NSS) from Eurocrypt 2001”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 1–20. Berlin/Heidelberg: Springer.

[111] Gentry, C. and M. Szydlo (2002). “Cryptanalysis of the NTRU Signature Scheme”, Advances in Cryptology—EUROCRYPT ’02, Lecture Notes in Computer Science, 2332. pp. 299–320. Berlin/Heidelberg: Springer.

[112] Gilbert, H. and M. Minier (2000). “A Collision Attack on Seven Rounds of Rijndael”, pp. 230–241. Proceedings of the 3rd AES Conference, NIST, New York, April 2000.

[113] * Goldreich, O. (2001). Foundations of Cryptography, Volume 1: Basic Tools. Cambridge: Cambridge University Press.

[114] * Goldreich, O. (2004). Foundations of Cryptography, Volume 2: Basic Applications. Cambridge: Cambridge University Press.

[115] Goldreich, O., S. Goldwasser and S. Halevi (1997). “Public-key Cryptosystems from Lattice Reduction Problems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 112–131. Berlin/Heidelberg: Springer.

[116] Goldwasser, S. and J. Kilian (1986). “Almost All Primes Can Be Quickly Certified”, pp. 316–329. Prodeedings of the 18th Annual ACM Symposium on Theory of Computing, Berkeley, California.

[117] Goldwasser, S. and S. Micali (1984). “Probabilistic Encryption”, Journal of Computer and Systems Sciences, 28: 270–299.

[118] Gordon, D. M. (1985). “Strong Primes are Easy to Find”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 216–223. Berlin/Heidelberg: Springer.

[119] Gordon, D. M. (1993). “Discrete Logarithms in GF (p) Using the Number Field Sieve”, SIAM Journal of Discrete Mathematics, 6: 124–138.

[120] Gordon, D. M. and K. S. McCurley (1992). “Massively Parallel Computation of Discrete Logarithms”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 312–323. Berlin/Heidelberg: Springer.

[121] Grinstead, C. M. and J. L. Snell (1997). Introduction to Probability, 2nd revised ed. Providence, Rhode Island: American Mathematical Society. The book is also available at http://www.dartmouth.edu/~chance/book.html (October 2008).

[122] Guillou, L. C. and J.-J. Quisquater (1988). “A Practical Zero-Knowledge Protocol Fitted to Security Microprocessor Minimizing Both Trasmission and Memory”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 123–128. Berlin/Heidelberg: Springer.

[123] Hankerson, D., A. J. Menezes and S. Vanstone (2004). Guide to Elliptic Curve Cryptography. New York: Springer.

[124] Hartshorne, R. (1977). Algebraic Geometry. Graduate Texts in Mathematics, 52. New York, Heidelberg and Berlin: Springer.

[125] * Herstein, I. N. (1975). Topics in Algebra. New York: John Wiley & Sons.

[126] Hess, F., G. Seroussi and N. P. Smart (2000). “Two Topics in Hyperelliptic Cryptography”. HP Labs technical report HPL-2000-118.

[127] * Hoffman, K. and R. Kunze (1971). Linear Algebra. Englewood Cliffs, New Jersey: Prentice-Hall.

[128] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2003). “NTRUSign: Digital Signatures Using the NTRU Lattice”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 122–140. Berlin/Heidelberg: Springer.

[129] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2005). “Performance Improvements and a Baseline Parameter Generation Algorithm for NTRUSign”, Workshop on Mathematical Problems and Techniques in Cryptology, Barcelona, Spain, June 2005. Also available at http://www.ntru.com/cryptolab/articles.htm (October 2008).

[130] Hoffstein, J., J. Pipher and J. H. Silverman (1998). “NTRU: A Ring-Based Public Key Cryptosystem”, Algorithmic Number Theory—ANTS-III, Lecture Notes in Computer Science, 1423. pp. 267–288. Berlin/Heidelberg: Springer.

[131] Hoffstein, J., J. Pipher and J. H. Silverman (2001). “NSS: An NTRU Lattice-Based Signature Scheme”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 211–228. Berlin/Heidelberg: Springer.

[132] Horster, P., M. Michels and H. Petersen (1994). “Meta-ElGamal Signature Schemes”. Technical report TR-94-5-F, Department of Computer Science, Teschnische Universität, Chemnitz-Zwickau.

[133] * Hungerford, T. W. (1974). Algebra, 5th ed. Graduate Texts in Mathematics, 73. Berlin: Springer.

[134] IEEE (2008), “Standard Specifications for Public-Key Cryptography” [online document]. Available at http://grouper.ieee.org/groups/1363/index.html (October 2008).

[135] IETF (2008), “The Internet Engineering Task Force” [online document]. Available at http://www.ietf.org/ (October 2008).

[136] * Ireland, K. and M. Rosen (1990). A Classical Introduction to Modern Number Theory. Graduate Texts in Mathematics, 84. New York: Springer.

[137] Izu, T., B. Möller and T. Takagi (2002). “Improved Elliptic Curve Multiplication Methods Resistant Against Side Channel Attacks”, Progress in Cryptology—INDOCRYPT 2002, Lecture Notes in Computer Science, 2551. pp. 296–313. Berlin/Heidelberg: Springer.

[138] Izu, T. and T. Takagi (2002). “A Fast Parallel Elliptic Curve Multiplication Resistant Against Side Channel Attacks”, Public Key Cryptography—PKC 2002, Lecture Notes in Computer Science, 2274. pp. 280–296. Berlin/Heidelberg: Springer. An improved version of this paper is published as the technical report CORR 2002-03 of the Centre for Applied Cryptographic Research, University of Waterloo, Canada, and is available at http://www.cacr.math.uwaterloo.ca/ (October 2008).

[139] Jacobson, M. J., N. Koblitz, J. H. Silverman, A. Stein and E. Teske (2000). “Analysis of the Xedni Calculus Attack”, Design, Codes and Cryptography, 20: 41–64.

[140] Janusz, G. J. (1995). Algebraic Number Fields. Providence, Rhode Island: American Mathematical Society.

[141] Johnson, D. and A. Menezes (1999). “The Elliptic Curve Digitial Signature Algorithm (ECDSA)”. Technical report CORR 99-34, Department of Combinatorics and Optimization, University of Waterloo, Canada. Also published in International Journal on Information Security (2001), 1: 36–63.

[142] Joye, M., A. K. Lenstra and J.-J. Quisquater (1999). “Chinese Remaindering Based Cryptosystems in the Presence of Faults”, Journal of Cryptology, 12 (4): 241–246.

[143] Kaltofen, E. and V. Shoup (1995). “Subquadratic-Time Factoring of Polynomials over Finite Fields”, pp. 398–406. Proceedings of the 27th Annual ACM Symposium on Theory of Computing, Las Vegas, Nevada.

[144] Kampkötter, W. (1991). Explizite Gleichungen für Jacobishe Varietäten hyperelliptischer Kurven [dissertation]. Essen: Gesamthochschule.

[145] Katz, J. and Y. Lindell (2007). Introduction to Modern Cryptography. Boca Raton, Florida; London and New York: CRC Press.

[146] Kaye, P. and C. Zalka (2004), “Optimized Quantum Implementation of Elliptic Curve Arithmetic over Binary Fields” [online document]. Available at http://arxiv.org/abs/quant-ph/0407095 (October 2008).

[147] * Knuth, D. E. (1997). The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Reading, Massachusetts: Addison-Wesley.

[148] Ko, K. H., S. J. Lee, J. H. Cheon, J. W. Han, J. S. Kang and C. S. Park (2000). “New Public-Key Cryptosystem Using Braid Groups”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 166–183. Berlin/Heidelberg: Springer.

[149] Koblitz, N. (1984). p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd ed. Graduate Texts in Mathematics, 58. New York, Heidelberg and Berlin: Springer.

[150] Koblitz, N. (1987). “Elliptic Curve Cryptosystems”, Mathematics of Computation, 48: 203–209.

[151] Koblitz, N. (1989). “Hyperelliptic Cryptosystems”, Journal of Cryptology, 1: 139–150.

[152] Koblitz, N. (1993). Introduction to Elliptic Curves and Modular Forms, 2nd ed. Graduate Texts in Mathematics, 97. Berlin: Springer.

[153] * Koblitz, N. (1994). A Course in Number Theory and Cryptography, 2nd ed. New York:Springer.

[154] Koblitz, N. (1998). Algebraic Aspects of Cryptography. New York: Springer.

[155] Kocher, P. C. (1996). “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 104–113. Berlin/Heidelberg: Springer.

[156] Kocher, P. C., J. Jaffe and B. Jun (1999). “Differential Power Analysis”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 388–397. Berlin/Heidelberg: Springer.

[157] Lagarias, J. C. and A. M. Odlyzko (1985). “Solving Low-Density Subset Sum Problems”, Journal of ACM, 32: 229–246.

[158] LaMacchia, B. A. and A. M. Odlyzko (1991a). “Computation of Discrete Logarithms in Prime Fields”, Designs, Codes and Cryptography, 1: 46–62.

[159] LaMacchia, B. A. and A. M. Odlyzko (1991b). “Solving Large Sparse Linear Systems over Finite Fields”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 537. pp. 109–133. Berlin/Heidelberg: Springer.

[160] Lang, S. (1994). Algebraic Number Theory. Graduate Texts in Mathematics, 110. New York: Springer.

[161] Law, L., A. Menezes, A. Qu, J. Solinas and S. Vanstone (1998). “An Efficient Protocol for Authenticated Key Agreement”. Technical report CORR 98-05, Department of Combinatorics and Optimization, University of Waterloo, Canada.

[162] Lehmer, D. H. and R. E. Powers (1931). “On Factoring Large Numbers”, Bulletin of the AMS, 37: 770–776.

[163] Lenstra, A. K., E. Tromer, A. Shamir, W. Kortsmit, B. Dodson, J. Hughes and P. Leyland (2003). “Factoring Estimates for a 1024-Bit RSA Modulus”, Advances in Cryptology—ASIACRYPT 2003, Lecture Notes in Computer Science, 2894. pp. 55–74. Berlin/Heidelberg: Springer.

[164] Lenstra, A. K. and H. W. Lenstra (1990). “Algorithms in Number Theory”, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, Volume A, pp. 675–715, Amsterdam: Elsevier.

[165] Lenstra, A. K. and H. W. Lenstra (ed.) (1993). The Development of the Number Field Sieve. Lecture Notes in Mathematics, 1554. Berlin: Springer.

[166] Lenstra, A. K., H. W. Lenstra and L. Lovasz (1982). “Factoring Polynomials with Rational Coefficients”, Mathematische Annalen, 261: 515–534.

[167] Lenstra, A. K., H. W. Lenstra, M. S. Manasse and J. M. Pollard (1990). “The Number Field Sieve”, pp. 564–572. Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, Maryland, USA, 13–17 May.

[168] Lenstra, A. K. and A. Shamir (2000). “Analysis and Optimization of the TWINKLE Factoring Device”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 35–52. Berlin/Heidelberg: Springer.

[169] Lenstra, A. K., A. Shamir, J. Tomlinson and E. Tromer (2002). “Analysis of Bernstein’s Factorization Circuit”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 1–26. Berlin/Heidelberg: Springer.

[170] Lenstra, A. K. and E. R. Verheul (2000a). “The XTR Public Key System”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 1–20. Berlin/Heidelberg: Springer.

[171] Lenstra, A. K. and E. R. Verheul (2000b). “Key Improvements to XTR”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 220–233. Berlin/Heidelberg: Springer.

[172] Lenstra, A. K. and E. R. Verheul (2001a). “An Overview of the XTR Public Key System”, pp. 151–180. Proceedings of the Public Key Cryptography and Computational Number Theory Conference, Warsaw, Poland, 2000. Berlin: Verlages Walter de Gruyter.

[173] Lenstra, A. K. and E. R. Verheul (2001b). “Fast Irreducibility and Subgroup Membership Testing in XTR”, Public Key Cryptography—PKC 2001, Lecture Notes in Computer Science, 1992. pp. 73–86. Berlin/Heidelberg: Springer.

[174] Lenstra, H. W. (1987). “Factoring Integers with Elliptic Curves”, Annals of Mathematics, 126: 649–673.

[175] Lenstra, H. W. and C. Pomerance (2005), “Primality Testing with Gaussian Periods” [online document]. Available at http://www.math.dartmouth.edu/~carlp/PDF/complexity12.pdf (October 2008).

[176] Lercier, R. (1997). “Finding Good Random Elliptic Curves for Cryptosystems Defined over “, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 379–392. Berlin/Heidelberg: Springer.

[177] Lercier, R. and D. Lubicz (2003). “Counting Points on Elliptic Curves over Finite Fields of Small Characteristic in Quasi Quadratic Time”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 360–373. Berlin/Heidelberg: Springer.

[178] Libert, B. and J.-J. Quisquater (2003), “New Identity Based Signcryption Schemes from Pairings” [online document]. Available at http://eprint.iacr.org/2003/023/ (October 2008).

[179] Lidl, R. and H. Niederreiter (1984). Finite Fields, Encyclopedia of Mathematics and Its Applications, 20. Cambridge: Cambridge University Press.

[180] Lidl, R. and H. Niederreiter (1994). Introduction to Finite Fields and Their Applications. Cambridge: Cambridge University Press.

[181] Liu, D. and P. Ning (2003a). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 52–61. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, October 2003.

[182] Liu, D. and P. Ning (2003b). “Location-Based Pairwise Key Establishments for Static Sensor Networks”, pp. 72–82. Proceedings of the 1st ACM Workshop on Security in Ad Hoc and Sensor Networks, Fairfax, Virginia, 31 October 2003.

[183] Liu, D., P. Ning and R. Li (2005). “Establishing Pairwise Keys in Distributed Sensor Networks”, ACM Transactions on Information and System Security, (8) 1: 41–77.

[184] Lucks, S. (2000). “Attacking Seven Rounds of Rijndael Under 192-bit and 256-bit Keys”, pp. 215–229. Proceedings of the 3rd Advanced Encryption Standard Candidate conference, New York, April 2000.

[185] Malone-Lee, J. (2002), “Identity-Based Signcryption” [online document]. Available at http://eprint.iacr.org/2002/098/ (October 2008).

[186] Mao, W. (2001). “New Zero-Knowledge Undeniable Signatures—Forgery of Signature Equivalent to Factor-isation”. Hewlett-Packard technical report HPL-2201-36.

[187] Mao, W. and K. G. Paterson (2000). “Convertible Undeniable Standard RSA Signatures”. Hewlett-Packard technical report HPL-2000-148.

[188] Matsumoto, T. and H. Imai (1988). “Public Quadratic Polynomial-Tuples for Efficient Signature-Verification and Message-Encryption”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 419–453. Berlin/Heidelberg: Springer.

[189] McCurley, K. S. (1990). “The Discrete Logarithm Problem”, in C. Pomerance and S. Goldwasser (eds.), Cryptology and Computational Number Theory: American Mathematical Society Short Course, Boulder, Colorado, 6–7 August 1989. Proceedings of Symposia in Applied Mathematics, 42. pp. 49–74. Providence, Rhode Island: American Mathematical Society.

[190] McEliece, R. J. (1978). “A Public-Key Cryptosystem Based on Algebraic Coding Theory”. DSN progress report 42–44, Jet Propulsion Laboratory, California Institute of Technology, pp. 114–116.

[191] Menezes, A. J. (ed.) (1993). Applications of Finite Fields. Boston: Kluwer Academic Publishers.

[192] Menezes, A. J. (1993). Elliptic Curve Public Key Cryptosystems. The Springer International Series in Engineering and Computer Science, 234. Springer. Available at http://books.google.co.in/books?id=bIb54ShKS68C (October 2008).

[193] Menezes, A. J., T. Okamoto and S. Vanstone (1993). “Reducing Elliptic Curve Logarithms to a Finite Field”, IEEE Transactions on Information Theory, 39: 1639–1646.

[194] Menezes, A. J., P. van Oorschot and S. Vanstone (1997). Handbook of Applied Cryptography. Boca Raton, Florida: CRC Press.

[195] Menezes, A. J., Y. Wu and R. Zuccherato (1996). “An Elementary Introduction to Hyperelliptic Curves”. CACR technical report CORR 96-19, University of Waterloo, Canada.

[196] Merkle, R. C. amd M. E. Hellman (1978). “Hiding Information and Signatures in Trapdoor Knapsacks”, IEEE Transactions on Information Theory, 24 (5): 525–530.

[197] Mermin, N. D. (2003). “From Cbits to Qbits: Teaching Computer Scientists Quantum Mechanics”, American Journal of Physics, 71: 23–30.

[198] Mermin, N. D. (2006), “Phys481-681-CS483 Lecture Notes and Homework Assignments” [online document]. Available at http://people.ccmr.cornell.edu/~mermin/qcomp/CS483.html (October 2008).

[199] Messerges, T. S. (2000). “Securing the AES Finalists Against Power Analysis Attacks”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 150–164. Berlin/Heidelberg: Springer.

[200] Messerges, T. S., E. A. Dabbish and R. H. Sloan (1999). “Power Analysis Attacks of Modular Exponentiation in Smartcards”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1717. pp. 144–157. Berlin/Heidelberg: Springer.

[201] Messerges, T. S., E. A. Dabbish and R. H. Sloan (2002). “Examining Smart-Card Security Under the Threat of Power Analysis Attacks”, IEEE Transactions on Computers, 51 (4): 541–552.

[202] Michels, M. and M. Stadler (1997). “Efficient Convertible Undeniable Signature Schemes”, pp. 231–244. Proceedings of the 4th International Workshop on Selected Areas in Cryptography, Ottawa, Canada.

[203] Mignotte, M. (1992). Mathematics for Computer Algebra. New York: Springer.

[204] Miller, G. L. (1976). “Riemann’s Hypothesis and Tests for Primality”, Journal of Computer and System Sciences, 13: 300–317.

[205] Miller, V. (1986). “Uses of Elliptic Curves in Cryptography”, Advances in Cryptology—CRYPTO ’85, Lecture Notes in Computer Science, 18. pp. 417–426. Berlin/Heidelberg: Springer.

[206] Möller, B. (2001). “Securing Elliptic Curve Point Multiplication Against Side-Channel Attacks”, Information Security Conference, Lecture Notes in Computer Science, 2200. pp. 324–334. Berlin/Heidelberg: Springer.

[207] Mollin, R. A. (1998). Fundamental Number Theory with Applications. Boca Raton, Florida: Chapman & Hall/CRC.

[208] Mollin, R. A. (1999). Algebraic Number Theory. Boca Raton, Florida: Chapman & Hall/CRC.

[209] Mollin, R. A. (2001). An Introduction to Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.

[210] Montgomery, P. L. (1985). “Modular Multiplication Without Trial Division”, Mathematics of Computation, 44: 519–521.

[211] Montgomery, P. L. (1994). “A Survey of Modern Integer Factorization Algorithms”, CWI Quarterly, 7 (4): 337–366.

[212] Montgomery, P. L. (1995). “A Block Lanczos Algorithm for Finding Dependencies over GF(2)”, Advances in Cryptology—EUROCRYPT ’95, Lecture Notes in Computer Science, 921. pp. 106–120. Berlin/Heidelberg: Springer.

[213] Morrison, M. A. and J. Brillhart (1975). “A Method of Factoring and a Factorization of F7”, Mathematics of Computation, 29: 183–205.

[214] * Motwani, R. and P. Raghavan (1995). Randomized Algorithms. Cambridge: Cambridge University Press.

[215] Muir, J. A. (2001). Techniques of Side Channel Cryptanalysis [dissertation]. Canada: University of Waterloo. Available at http://www.uwspace.uwaterloo.ca/bitstream/10012/1098/1/jamuir2001.pdf (October 2008).

[216] Neukirch, J. (1999). Algebraic Number Theory. Berlin and Heidelberg: Springer.

[217] Nguyen, P. Q. (2006), “A Note on the Security of NTRUSign” [online document]. Available at http://eprint.iacr.org/2006/387 (October 2008).

[218] * Nielsen, M. A. and I. L. Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press.

[219] NIST (2001), “Advanced Encryption Standard” [online document]. Available at http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf (October 2008).

[220] NIST (2006), “Digital Signature Standard (DSS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_186-3/Draft-FIPS-186-3%20_March2006.pdf (October 2008).

[221] NIST (2007a), “Federal Information Processing Standards” [online document]. Available at http://csrc.nist.gov/publications/PubsFIPS.html (October 2008).

[222] NIST (2007b), “Secure Hash Standard (SHS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_180-3/draft_fips-180-3_June-08-2007.pdf (October 2008).

[223] Nyberg, K. and R. A. Rueppel (1993). “A New Signature Scheme Based on the DSA Giving Message Recovery”, pp. 58–61. Proceedings of the 1st ACM Conference on Computer and Communications Security, Fairfax, Virginia, 3–5 November.

[224] Nyberg, K. and R. A. Rueppel (1995). “Message Recovery for Signature Schemes Based on the Discrete Logarithm Problem”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 182–193. Berlin/Heidelberg: Springer.

[225] Odlyzko, A. M. (1985). “Discrete Logarithms and Their Cryptographic Significance”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 224–314. Berlin/Heidelberg: Springer.

[226] Odlyzko, A. M. (2000). “Discrete Logarithms: The Past and the Future”, Designs, Codes and Cryptography, 19: 129–145.

[227] Okamoto, T. (1992). “Provably Secure and Practical Identification Schemes and Corresponding Signature Schemes”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 31–53. Berlin/Heidelberg: Springer.

[228] Okamoto, T., E. Fujisaki and H. Morita (1998). “TSH-ESIGN: Efficient Digital Signature Scheme Using Trisection Size Hash”, submission to IEEE P1363a.

[229] Papadimitriou, C. H. (1994). Computational Complexity. Reading, Massachusetts: Addison-Wesley.

[230] Park, S., T. Kim, Y. An and D. Won (1995). “A Provably Entrusted Undeniable Signature”, pp. 644–648. IEEE Singapore International Conference on Network/International Conference on Information Engineering (SICON/ICIE ’95).

[231] Patarin, J. (1995). “Cryptanalysis of the Matsumoto and Imai Public Key Scheme of Eurocrypt’88”, Advances in Cryptology—CRYPTO ’95, Lecture Notes in Computer Science, 963. pp. 248–261. Berlin/Heidelberg: Springer.

[232] Patarin, J. (1996). “Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): Two New Families of Asymmetric Algorithms”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 33–48. Berlin/Heidelberg: Springer.

[233] Pirsig, R. M. (1974). Zen and the Art of Motorcycle Maintenance: An Inquiry into Values. London: Bodley Head.

[234] Pohlig, S. and M. Hellman (1978). “An Improved Algorithm for Computing Logarithms over GF (p) and its Cryptographic Significance”, IEEE Transactions on Information Theory, 24: 106–110.

[235] Pohst, M. and H. Zassenhaus (1989). Algorithmic Algebraic Number Theory, Encyclopaedia of Mathematics and Its Applications, 30. Cambridge: Cambridge University Press.

[236] Pointcheval, D. and J. Stern (1996). “Provably Secure Blind Signature Schemes”, Advances in Cryptology—ASIACRYPT ’96, Lecture Notes in Computer Science, 1163. pp. 252–265. Berlin/Heidelberg: Springer.

[237] Pointcheval, D. and J. Stern (2000). “Security Arguments for Digital Signatures and Blind Signatures”, Journal of Cryptology, 13 (3): 361–396.

[238] Pollard, J. M. (1974). “Theorems on Factorization and Primality Testing”, Proceedings of the Cambridge Philosophical Society, 76 (2): 521–528.

[239] Pollard, J. M. (1975). “A Monte Carlo Method for Factorization”, BIT, 15 (3): 331–334.

[240] Pollard, J. M. (1993). “Factoring with Cubic Integers”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 4–10. Berlin: Springer.

[241] Pomerance, C. (1985). “The Quadratic Sieve Factoring Algorithm”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 169–182. Berlin/Heidelberg: Springer.

[242] Pomerance, C. (2008). “Elementary Thoughts on Discrete Logarithms”, pp. 385–396. in J. P. Buhler and P. Stevenhagen (eds.), Surveys in Algorithmic Number Theory, Publications of the Research Institute for Mathematical Sciences, 44. New York: Cambridge University Press.

[243] Preskill, J. (1998). “Quantum Computing: Pro and Con”, Proceedings of the Royal Society of London, A454:469–486.

[244] Preskill, J. (2007), “Course Information for Quantum Computation” [online document]. Available at http://theory.caltech.edu/people/preskill/ph219/ (October 2008).

[245] Proos, J. and C. Zalka (2004), “Shor’s Discrete Logarithm Quantum Algorithm for Elliptic Curves” [online document]. Available at http://arxiv.org/abs/quant-ph/0301141 (October 2008).

[246] Rabin, M. O. (1979). “Digitalized Signatures and Public-Key Functions as Intractable as Factorization”. Technical report MIT/LCS/TR-212, MIT Laboratory for Computer Science, Massachusetts.

[247] Rabin, M. O. (1980a). “Probabilistic Algorithms in Finite Fields”, SIAM Journal of Computing, 9: 273–280.

[248] Rabin, M. O. (1980b). “Probabilistic Algorithm for Testing Primality”, Journal of Number Theory, 12: 128–138.

[249] Ram Murty, M. (2001). Problems in Analytic Number Theory. New York: Springer.

[250] Raymond, J.-F. and A. Stiglic (2000), “Security Issues in the Diffie-Hellman Key Agreement Protocol” [online document]. Available at http://crypto.cs.mcgill.ca/~stiglic/Papers/dhfull.pdf (October 2008).

[251] Ribenboim, P. (2001). Classical Theory of Algebraic Numbers. Universitext. New York: Springer.

[252] Rivest, R. L., A. Shamir, and L. M. Adleman (1978). “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems”, Communications of the ACM, 2: 120–126.

[253] Rosser, J. and J. Schoenfield (1962). “Approximate Formulas for Some Functions of Prime Numbers”, Illinois Journal of Mathematics, 6: 64–94.

[254] RSA Security Inc. (2008), “Public-Key Cryptography Standards” [online document]. Available at http://www.rsa.com/rsalabs/node.asp?id=2124 (October 2008).

[255] Sakurai, J. J. (1994). Modern Quantum Mechanics. Revised by San-Fu Tuan, Reading, Massachusetts: Addison-Wesley.

[256] Satoh, T. (2000). “The Canonical Lift of an Ordinary Elliptic Curve over a Finite Field and Its Point Counting”, Journal of Ramanujan Mathematical Society, 15: 247–270.

[257] Satoh, T. and K. Araki (1998). “Fermat Quotients and the Polynomial Time Discrete Log Algorithm for Anomalous Elliptic Curves”, Commentarii Mathematici Universitatis Sancti Pauli, 47: 81–92.

[258] Schiff, L. I. (1968). Quantum Mechanics, 3rd ed. New York: McGraw-Hill.

[259] Schindler, W., F. Koeune and J.-J. Quisquater (2001). “Unleashing the Full Power of Timing Attack”. Technical report CG-2001/3, Université Catholique de Louvain, Belgium. Available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.6622.

[260] Schirokauer, O. (1993). “Discrete Logarithms and Local Units”, Philosophical Transactions of the Royal Society of London, Series A, 345: 409–423.

[261] Schirokauer, O., D. Weber, and T. Denny (1996). “Discrete Logarithms: The Effectiveness of the Index Calculus Method”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.

[262] * Schneier, B. (2006). Applied Cryptography, 2nd ed. New York: John Wiley & Sons.

[263] Schnorr, C. P. (1991). “Efficient Signature Generation for Smart Cards”, Journal of Cryptology, 4: 161–174.

[264] Schoof, R. (1995). “Counting Points on Elliptic Curves over Finite Fields”, Journal de Théorie des Nombres de Bourdeaux, 7: 219-254.

[265] Semaev, I. A. (1998). “Evaluation of Discrete Logarithms on Some Elliptic Curves”, Mathematics of Computation, 67: 353–356.

[266] Shamir, A. (1984). “A Polynomial-Time Algorithm for Breaking the Basic Merkle-Hellman Cryptosystem”, IEEE Transactions on Information Theory, 30: 699–704.

[267] Shamir, A. (1984). “Identity-Based Cryptosystems and Signature Schemes”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 47–53. Berlin/Heidelberg: Springer.

[268] Shamir, A. (1997). “How to Check Modular Exponentiation”, presented at the rump session of Advances in Cryptology—EUROCRYPT ’97, May.

[269] Shamir, A. (1999). “Factoring Large Numbers with the TWINKLE Device”, Cryptographic Hardware and Embedded Systems—CHES ’99, Lecture Notes in Computer Science, 1717. pp. 2–12. Berlin/Heidelberg: Springer.

[270] Shamir, A. and E. Tromer (2003). “Factoring Large Numbers with the TWIRL Device”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 1–26. Berlin/Heidelberg: Springer.

[271] Shor, P. W. (1997). “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer”, SIAM Journal of Computing, 26: 1484–1509.

[272] Shoup, V. (1990). “On the Deterministic Complexity of Factoring Polynomials over Finite Fields”, Information Processing Letters, 33: 261–267.

[273] Shparlinski, I. E. (1991). “On Some Problems in the Theory of Finite Fields”, Russian Mathematical Surveys, 46 (1): 199–240.

[274] Shparlinski, I. E. (1992). Computational and Algorithmic Problems in Finite Fields, Mathematics and its Applications, 88. Kluwer Academic Publishers.

[275] * Silverman, J. H. (1986). The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 106. Berlin and New York: Springer.

[276] Silverman, J. H. (1994). Advanced Topics in the Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 151. New York: Springer.

[277] Silverman, J. H. (2000). “The Xedni Calculus and the Elliptic Curve Discrete Logarithm Problem”, Design, Codes and Cryptography, 20: 5–40.

[278] Silverman, J. H. and J. Suzuki (1998). “Elliptic Curve Discrete Logarithms and the Index Calculus”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 110–125. Berlin/Heidelberg: Springer.

[279] Silverman, R. D. (1987). “The Multiple Polynomial Quadratic Sieve”, Mathematics of Computation, 48: 329–339.

[280] * Sipser, M. (1997). Introduction to the Theory of Computation, 2nd ed. Boston: PWS Publishing Company.

[281] B. Skjernaa (2003). “Satoh’s Algorithm in Characteristic 2”, Mathematics of Computation, 72: 477–487.

[282] Smart, N. P. (1999). “The Discrete Logarithm Problem on Elliptic Curves of Trace One”, Journal of Cryptology, 12: 193–196.

[283] Smart, N. P. (2002). Cryptography: An Introduction. New York: McGraw-Hill. The 2nd edition of this book is available online at http://www.cs.bris.ac.uk/~nigel/Crypto_Book/ (October 2008).

[284] Smith, P. J. (1993). “LUC Public-Key Encryption: A Secure Alternative to RSA”, Dr. Dobb’s Journal, 18 (1): 44–49.

[285] Smith, P. J. and M. J. J. Lennon (1993). “LUC: A New Public Key System”, IFIP Transactions, A 37. pp. 103–117. Proceedings of the IFIP TC11, 9th International Conference on Information Security. Computer Security. Amsterdam: North-Holland Co.

[286] Smith, P. J. and C. Skinner (1995). “A Public-Key Cryptosystem and Digital Signature System Based on the Lucas Function Analogue to Discrete Logarithms”, Advances in Cryptology—ASIACRYPT ’94, Lecture Notes in Computer Science, 917. pp. 357–364. Berlin/Heidelberg: Springer.

[287] Solovay, R. and V. Strassen (1977). “A Fast Monte Carlo Test for Primality”, SIAM Journal of Computing, 6: 84–86.

[288] * Stallings, W. (2006). Cryptography and Network Security, 4th ed. Upper Saddle River, New Jersey: Prentice-Hall.

[289] Stam, M. and A. K. Lenstra (2001). “Speeding up XTR”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 125–143. Berlin/Heidelberg: Springer.

[290] Stein, A. and E. Teske (2005). “Optimized Baby Step-Giant Step Methods”, Journal of Ramanujan Mathematical Society, 20 (1): 27–58.

[291] * Stinson, D. (2005). Cryptography: Theory and Practice, 3rd ed. Boca Raton, Florida: CRC Press.

[292] Strassen, V. (1969). “Gaussian Elimination Is not Optimal”, Numerische Mathematik, 13: 354–356.

[293] Stucki, D., N. Gisin, O. Guinnard, G. Ribordy and H. Zbinden (2002). “Quantum Key Distribution over 67 km with a Plug & Play System”, New Journal of Physics, 4: 41.1–41.8.

[294] Sun, H.-M., W.-C. Yang and C.-S. Laih (1999). “On the Design of RSA with Short Secret Exponent”, Advances in Cryptology—ASIACRYPT ’99, Lecture Notes in Computer Science, 1716. pp. 150–164. Berlin/Heidelberg: Springer.

[295] Swade, D. (2000). The Cogwheel Brain: Charles Babbage and the Quest to Build the First Computer. London: Little, Brown and Company.

[296] Trappe, W. and L. C. Washington (2006). Introduction to Cryptography with Coding Theory, 2nd ed. Upper Saddle River: Prentice-Hall.

[297] Verheul, E. R. (2001). “Evidence that XTR is More Secure than Supersingular Elliptic Curve Cryptosystems”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 195–210. Berlin/Heidelberg: Springer.

[298] Washington, L. C. (2003). Elliptic Curves: Number Theory and Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.

[299] Weber, D. (1996). “Computing Discrete Logarithms with the General Number Field Sieve”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.

[300] Weber, D. (1998). “Computing Discrete Logarithms with Quadratic Number Rings”, Advances in Cryptology—EUROCRYPT ’98, Lecture Notes in Computer Science, 1403. pp. 171–183. Berlin/Heidelberg: Springer.

[301] Weber, D. and T. Denny (1998). “The Solution of McCurley’s Discrete Log Challenge”, Advances in Cryptology—CRYPTO ’98, Lecture Notes in Computer Science, 1462. pp. 458–471. Berlin/Heidelberg: Springer.

[302] Western, A. E. and J. C. P. Miller (1968). “Tables of Indices and Primitive Roots”, Royal Mathematical Tables, 9, Cambridge: Cambridge University Press.

[303] Wiedemann, D. H. (1986). “Solving Sparse Linear Equations over Finite Fields”, IEEE Transactions on Information Theory, 32: 54–62.

[304] Wiener, M. J. (1990). “Cryptanalysis of Short RSA Secret Exponents”, IEEE Transactions on Information Theory, 36: 553–558.

[305] Williams, H. C. (1982). “A p + 1 Method for Factoring”, Mathematics of Computation, 39 (159): 225–234.

[306] Yang, L. T. and R. P. Brent (2001). “The Parallel Improved Lanczos Method for Integer Factorization over Finite Fields for Public Key Cryptosystems”, pp. 106–114. Proceedings of the ICPP Workshops 2001, Valencia, Spain, 3–7 September.

[307] Young, A. and M. Yung (1996). “The Dark Side of “Black-Box” Cryptography, or: Should We Trust Capstone?”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 89–103. Berlin/Heidelberg: Springer.

[308] Young, A. and M. Yung (1997a). “Kleptography: Using Cryptography Against Cryptography”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 62–74. Berlin/Heidelberg: Springer.

[309] Young, A. and M. Yung (1997b). “The Prevalence of Kleptographic Attacks on Discrete-Log Based Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 264–276. Berlin/Heidelberg: Springer.

[310] Zheng, Y. (1997). “Digital Signcryption or How to Achieve Cost(Signature & Encryption) << Cost(Signature) + Cost(Encryption)”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 165–179. Berlin/Heidelberg: Springer.

[311] Zheng, Y. (1998a). “Signcryption and Its Applications in Efficient Public Key Solutions”, 1997 Information Security Workshop ISW ’97, Lecture Notes in Computer Science, 1397. pp. 291–312. Berlin/Heidelberg: Springer.

[312] Zheng, Y. (1998b). “Shortened Digital Signature, Signcryption, and Compact and Unforgeable Key Agreement Schemes”, contribution to IEEE P1363 Standard for Public Key Cryptography.

[313] Zheng, Y. and H. Imai (1998a). “Efficient Signcryption Schemes on Elliptic Curves”. Proceedings of the IFIP 14th International Information Security Conference IFIP/SEC ’98, Vienna, Austria, September 1998. Chapman & Hall.

[314] Zheng, Y. and H. Imai (1998b). “How to Construct Efficient Signcryption Schemes on Elliptic Curves”, Information Processing Letters, 68: 227–233.

[315] Zheng, Y. and T. Matsumoto (1996). “Breaking Smartcard Implementations of ElGamal Signatures and Its Variants”, presented at the rump session of Advances in Cryptology—ASIACRYPT ’96. Available at http://www.sis.uncc.edu/~yzheng/publications/ (October 2008).

[316] * Zuckerman, H. S., H. L. Montgomery, I. M. Niven and A. Niven (1991). An Introduction to the Theory of Numbers. New York: John Wiley & Sons.

Books marked by stars have Asian editions (at the time of writing this book).

Index

Copyright

Library of Congress Cataloging-in-Publication Data

Das, Abhijit.
     Public-key cryptography : theory and practice / Abhijit Das, C. E. Veni Madhavan.
               p. cm.
     Includes bibliographical references and index.
     ISBN: 978-8131708323 (pbk.)
  1. Public key cryptography. 2. Telecommunication—Security
measures-Mathematics. 3. Computers-Access control-Mathematics. I. Madhavan,
C. E. Veni. II. Title.     TK5102.94.D37 2009
     005.8'2-dc22
                                                                           2009012766

Copyright © 2009 Dorling Kindersley (India) Pvt. Ltd.

Licensees of Pearson Education in South Asia

This book is sold subject to the condition that it shall not, by way of trade or otherwise, be lent, resold, hired out, or otherwise circulated without the publisher’s prior written consent in any form of binding or cover other than that in which it is published and without a similar condition including this condition being imposed on the subsequent purchaser and without limiting the rights under copyright reserved above, no part of this publication may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise), without the prior written permission of both the copyright owner and the above-mentioned publisher of this book.

ISBN 9788131708323

Head Office: 482 FIE, Patparganj, Delhi 110 092, India

Registered Office: 14 Local Shopping Centre, Panchsheel Park, New Delhi 110 017, India

Printed in India.

Pearson Education Inc., Upper Saddle River, NJ
Pearson Education Ltd., London
Pearson Education Australia Pty, Limited, Sydney
Pearson Education Singapore, Pte. Ltd
Pearson Education North Asia Ltd, Hong Kong
Pearson Education Canada, Ltd., Toronto
Pearson Educacion de Mexico, S.A. de C.V.
Pearson Education-Japan, Tokyo
Pearson Education Malaysia, Pte. Ltd.

Preface

I can’t understand why a person will take a year to write a novel when he can easily buy one for a few dollars.

—Fred Allen

The first moral question that we faced (like most authors) is: “Why another book?” Available textbooks on public-key cryptography (or cryptography in general) are many [37, 74, 113, 114, 145, 152, 153, 194, 209, 262, 283, 288, 291, 296]. In the presence of all these books, writing another may sound like a waste of energy and effort.

Fortunately, we have a big answer. Most cryptography textbooks today, even many of the celebrated ones, essentially take a narrative approach. While such an approach may be suitable for beginners at an undergraduate level, it misses the finer details in this rapidly growing area of applied mathematics. The fact that public-key cryptography is mathematical is hard to deny and a mathematical subject would be better treated in the mathematical way.

This is precisely the point that this book addresses, that is, it proceeds in a canonically mathematical way while revealing cryptographic concepts. This mathematics is often not so simple (and that is why other textbooks didn’t bother to mention it), but we plan to stick to mathematical sophistication as far as possible. A typical feature of this book is that it does not rely on anything other than the readers’ mathematical intuitions; it develops all the mathematical abstractions starting from scratch. Although computer science and mathematics students nowadays do undergo some courses on discrete structures somewhere in their curricula, we do not assume this; instead we develop the algebra starting at the level of set operations. Simpler structures like groups, rings and fields are followed by more complex concepts like finite fields, algebraic curves, number fields and p-adic numbers. The resulting (long) compilation of abstract mathematical tools tends to relieve cryptography students and researchers from consulting many mathematics books for understanding the background concepts. We are happy to offer this self-sufficient treatment complete with proofs and other details. The only place where we had to be somewhat sketchy is the discussion on elliptic and hyperelliptic curves. The mathematics here seems to be too vast to fit in a few pages and we opted for a deliberate simplification of these topics.

A big problem with discrete mathematics is that many of its proofs are existential. However, in order to make things work in a practical environment one must undergo algorithmic studies of algebra and number theory. This is what our book does next. While many algorithmic issues in this area are settled favourably, there remain some problems whose best known algorithmic complexities are still poor. Some of these so-called computationally difficult problems are used to build secure public-key cryptosystems. The security of these systems are assumed (rather than proven) and so we extensively deal with the algorithms known till date to solve these difficult problems. This is precisely the point that utilizes the mathematics developed in earlier chapters, to a great extent.

In Chapter 5, we eventually hit upon the culmination of all these mathematical and algorithmic studies in the design of public-key systems for achieving various cryptographic goals. Under the theoretical base developed in earlier chapters, Chapter 5 turns out to be an easy chapter. This is our way of looking into the problem, namely, a formal bottom–up approach. We claim to be different from most textbooks in this regard. Our discussion of mathematics is not for its own sake, but to develop the foundation of cryptographic primitives.

We then turn to some purely implementation and practical issues of public-key cryptography. Standards proposed by organizations such as IEEE and RSA Security Inc. promote interoperability of using crypto primitives in Internet applications. We then look at some small applications of the crypto basics. Some indirect ways of cryptanalysis are described next. These techniques (side-channel and backdoor attacks) give the book a strong practical flavour in tandem with its otherwise formal appearance.

As an eleventh-hour decision, we added a final chapter to our book, a chapter on quantum computation and its implications on public-key cryptography. Although somewhat theoretical at this point, quantum computation exhibits important ramifications in public-key cryptography. The mathematics behind quantum mechanics and computation are never discussed earlier just to highlight the distinctive nature of this chapter, which may perhaps be titled cryptography in future.

This schematic description of this book perhaps makes it clear that this book is better suited as a graduate-level textbook. A one- or two-semester graduate or advanced undergraduate course can run based on the contents of this book. Self-studying this book is also possible at an advanced graduate or research level, but is expected to be difficult at an undergraduate level. We highlight the importance of classroom teaching, if an undergraduate course is to be based on this textbook.

We rated different items in the book by their levels of difficulty and/or mathematical sophistication. Unstarred items can be covered even in undergraduate courses. Items marked by single stars can be taken seriously for a second course or a second reading. Doubly starred items, on the other hand, are research-level materials and can be pursued only in really advanced courses or for undergoing research. Inclusion of a good amount of these advanced topics marks another distinction of this book compared to other available textbooks.

The book comes with plenty of exercises. We have two-fold motivations behind these exercises. In the first place, they help the readers deepen their understanding of the matter discussed in the text. In the second place, some of these exercises build additional theory that we omit in the text proper. We occasionally make use of these additional topics in proving and/or explaining results in the text. We do not classify the exercises into easy and difficult ones, but specify hints, some of which are pretty explicit, for intellectually challenging parts. We separate out the hints in an appendix near the end of this book and leave the marker [H] in appropriate locations of the statements of the exercises. This practice prevents a reader from accidentally seeing a hint. Only when the reader gets stuck, (s)he can look at the hints at the end. We believe that the exercises, together with our discussion on algorithms and implementation issues, will offer serious students many ways to carry out substantial implementation work to further their research and development in cryptography.

Every chapter ends with annotated references for further studies. We do not claim to be encyclopaedic in this respect. Instead we mention only those references that, we feel, are directly related to the topics dealt with in the respective chapters.

As a trade-off between bulk and coverage, we had to leave many issues untouched. For example, we were limited by constraints of space to present symmetric-key cryptography in detail. However, in view of its importance today, we include brief discussions in an appendix on block ciphers, stream ciphers and hash functions. We also do not discuss anything about formal security of public-key protocols. The issues related to provable security are at the minimum theoretically important in the study of cryptography, but are entirely left out here. Only a brief discussion on the implication of complexity theory on the security of public-key protocols is included in another appendix. The Handbook of Applied Cryptography [194] by Menezes et al. can supplement this book for learning symmetric techniques, whereas the book by Delfs and Knebl [74] or those by Goldreich [113, 114] can be consulted for formal security issues.

We are indebted to everybody whose criticism, encouragement and support made this project materializable. Special thanks go to Bimal Roy, Chandan Mazumdar, C. Pandurangan, Debdeep Mukhopadhyay, Dipanwita Roychowdhury, Gagan Garg, Hartmut Wiebe, H. V. Kumar Swamy, Indranil Sengupta, Kapil Paranjape, Manindra Agarwal, Palash Sarkar, Rajesh Pillai, Rana Barua, R. Balasubramanian, Sanjay Barman, Shailesh, Satrajit Ghosh, Souvik Bhattacherjee, Srihari Vavilapalli, Subhamoy Maitra, Surjyakanta Mohapatro, and Uwe Storch. This book has been tested in postgraduate courses in the Indian Institute of Science, Bangalore, and in the Indian Institute of Technology Kharagpur. We sincerely thank all our students for pointing out many errors and suggesting several improvements. We express our deep gratitude to our family members for their constant understanding and moral support. We are also indebted to our institutes for providing the wonderful intellectual climate for completing this work.

A. D.

C. E. V. M.

Notations

Any time you are stuck on a problem, introduce more notation.

—Chris Skinner [Plenary Lecture, Aug 1997, Topics in Number Theory, Penn State]

General
|a|absolute value of real number a
min Sminimum of elements of set S
max Smaximum of elements of set S
exp(a)ea, where
log xlogarithm of x with respect to some unspecified base (like 10)
ln xloge x, where
lg xlog2 x
logk x(log x)k (similarly, lnk x = (ln x)k and lgk x = (lg x)k)
:=is defined as (or “is assigned the value” in code snippets)
i
complex conjugate (x – iy) of the complex number z = x + iy
δijKronecker delta
(asas–1 . . . a0)bb-ary representation of a non-negative integer
binomial coefficient, equals n(n – 1) ··· (nr + 1)/r!
xfloor of real number x
xceiling of real number x
[a, b]closed interval, that is, the set of real numbers x in the range axb
(a, b)open interval, that is, the set of real numbers x in the range a < x < b
L(t, α, c)expression of the form exp ((c + o(1))(ln t)α(ln ln t)1–α)
Lt[c]abbreviation for L(t, 1/2, c) (denoted also as L[c] if t is understood)
Bit-wise operations (on bit strings a, b)
NANDnegation of AND
NORnegation of OR
XORexclusive OR
abbit-wise exclusive OR (XOR) of a and b
a AND bbit-wise AND of a and b
a OR bbit-wise inclusive OR of a and b
LSk(a)left shift of a by k bits
RSk(a)right shift of a by k bits
LRk(a)left rotate (cyclic left shift) of a by k bits
RRk(a)right rotate (cyclic right shift) of a by k bits
ābit-wise complement of a
abconcatenation of a and b
Sets
empty set
#Acardinality of set A
a is an element of set A
ABset A is contained in set B
ABset A is not contained in set B
set A is properly contained in set B
ABunion of sets A and B
ABdisjoint union of sets A and B
ABintersection of sets A and B
A \ Bdifference of sets A and B
Ācomplement of set A (in a bigger set)
A × B(Cartesian) product of sets A and B
set of all natural numbers, that is, {1, 2, 3, . . .}
set of all non-negative integers, that is, {0, 1, 2, . . .}
set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .}
set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .}
set of all rational numbers, that is,
set of all non-zero rational numbers
set of all real numbers
set of all non-zero real numbers
set of all non-negative real numbers
set of all complex numbers
set of all non-zero complex numbers
, can be represented by the set {0, 1, . . . , n –1}
group of units in , can be represented as {a | 0 ≤ a < n, gcd(a, n) = 1}
finite field of cardinality q
multiplicative group of , that is,
ring of integers of number field K
group of units of
ring of p-adic integers
field of p-adic numbers
Upgroup of units of
Functions and relations
f : ABf is a function from set A to set B
f : ABf is an injective function from set A to set B
f : ABf is a surjective function from set A to set B
aba is mapped to b (by a function)
f ο gcomposition of functions f and g (applied from right to left)
f–1inverse of bijective function f
Ker fkernel of function (homomorphism) f
Im fimage of function f
~equivalent to
[a]equivalence class of a
Groups
aHcoset in a multiplicative group
a + Hcoset in an additive group
HKinternal direct product of (sub)groups H and K
H × Kexternal direct product of (sub)groups H and K
[G : H]index of subgroup H in group G
G/Hquotient group
G1G2groups G1 and G2 are isomorphic
ord Gorder (that is, cardinality) of group G
ordG aorder of element a in group G
Exp Gexponent of group G
Z(G)centre of group G
C(a)centralizer of group element a
GLn(K)general linear group over field K (of n × n matrices)
SLn(K)special linear group over field K (of n × n matrices)
Gtorstorsion subgroup of G
Rings
char Acharacteristic of ring A
A × Bdirect product of rings A and B
A*multiplicative group of units of ring A
Sfor ring A, ideal generated by SA, also written as
afor ring A, principal ideal generated by , also written as aA and Aa
ab (mod )a is congruent to b modulo ideal , that is,
ABrings A and B are isomorphic
quotient ring (modulo ideal )
a|ba divides b (in some ring)
vp(a)multiplicity of prime p in element a
pkak = vp(a)
nilradical of ring A
Aredreduction of ring A, equals
gcd(a, b)greatest common divisor of elements a and b
lcm(a, b)least common multiple of elements a and b
sum of ideals and
intersection of ideals and
product of ideals and
root (or radical) of ideal
Q(A)total quotient ring of ring A (quotient field of A, if A is an integral domain)
S–1Alocalization of ring A at multiplicative set S
localization of ring A at prime ideal
ring of integers of number field K
N()norm of ideal (in a Dedekind domain)
CRTChinese remainder theorem
EDEuclidean domain
DDDedekind domain
DVD (or DVR)discrete valuation domain (or ring)
PIDprincipal ideal domain
UFDunique factorization domain
Fields
char Kcharacteristic of field K
K*multiplicative group of units of field K, that is, K \ {0}
algebraic closure of field K
[K : F]degree of the field extension FK
K[a]
K(a){f(a)/g(a) | f(X), , g(a) ≠ 0}
Aut Kgroup of automorphisms of field K
AutF Kfor field extension FK, group of F-automorphisms of K (also Gal(K|F))
FixF Hfor field extension FK, fixed field of subgroup H of AutF K
finite field of cardinality q
multiplicative group of units of , that is,
Trtrace function
TrK|F (a)for field extension FK, trace of over F
Nnorm function
NK|F (a)for field extension FK, norm of over F
Frobenius automorphism , aaq
ring of integers of number field K
group of units of
ΔKdiscriminant of number field K
ring of p-adic integers
field of p-adic numbers
Upgroup of units of
| |pp-adic norm on
Integers
a quot bquotient of Euclidean division of a by b ≠ 0
a rem bremainder of Euclidean division of a by b ≠ 0
a|ba divides b in , that is, b = ca for some
vp(a)multiplicity of prime p in non-zero integer a
gcd(a, b)greatest common divisor of integers a and b (not both zero)
lcm(a, b)least common multiple of integers a and b
ab (mod n)a is congruent to b modulo n
a–1 (mod n)multiplicative inverse of a modulo n (given that gcd(a, n) = 1)
φ(n)Euler’s totient function
Legendre (or Jacobi) symbol
[a]ncoset
ordn amultiplicative order of a modulo n (given that gcd(a, n) = 1)
μ(n)Möbius function
π(x)number of primes between 1 and positive real number x
Li(x)Gauss’ Li function
ψ(x, y)fraction of positive integers ≤ x, that are y-smooth
ζ(s)Riemann zeta function
RHRiemann hypothesis
ERHextended Riemann hypothesis
Mn2n – 1 (Mersenne number)
232, standard radix for representation of multiple-precision integers
Polynomials
A[X1, . . . , Xn]polynomial ring in indeterminates X1, . . . , Xn over ring A
A(X1, . . . , Xn)ring of rational functions in indeterminates X1, . . . , Xn over ring A
deg fdegree of polynomial f
lc fleading coefficient of polynomial f
minpolyα,K(X)minimal polynomial of α over field K, belongs to K[X]
cont fcontent of polynomial f
pp fprimitive part of polynomial f
f′(X)formal derivative of polynomial f(X)
Δ(f)discriminant of polynomial f
the polynomial
μmgroup of m-th roots of unity
Фmm-th cyclotomic polynomial
Vector spaces, modules and matrices
dimK Vdimension of vector space V over field K
Span Sspan of subset S of a vector space
HomK(V, W)set of all K-linear transformations VW
EndK(V)set of all K-linear transformations VV
M/Nquotient vector space or module
MNvector spaces or modules M and N are isomorphic
direct product of modules Mi,
direct sum of modules Mi,
Attranspose of matrix (or vector) A
A–1inverse of matrix A
Rank Trank of matrix or linear transformation T
RankA Mrank of A-module M
Null Tnullity of matrix or linear transformation T
(M : N)for A-module M and submodule N, the ideal of A
AnnA(M)annihilator of A-module M, same as (M : 0)
Tors Mtorsion submodule of M
A[S]A-algebra generated by set S
v, winner product of two real vectors v and w
Algebraic curves
n-dimensional affine space over field K
n-dimensional projective space over field K
(x1, . . . , xn)homogeneous coordinates of a point in
[x0, x1, . . . , xn]projective coordinates of a point in
f(h)homogenization of polynomial f
C(K)set of K-rational points over curve C defined over field K
K[C]ring of polynomial functions on curve C defined over K
K(C)field of rational functions on curve C defined over K
[P]point P on a curve in formal sums
ordP (r)order of rational function r at point P
DivK (C)group of divisors on curve C defined over field K
group of divisors of degree 0 on curve C defined over field K
DivK(r)divisor of a rational function r
PrinK(C)group of principal divisors on curve C defined over field K
Jacobian of curve C defined over field K
PicK(C)Picard group of curve C (equals DivK(C)/ PrinK(C))
, same as Jacobian
point at infinity on an elliptic or a hyperelliptic curve
Δ(E)discriminant of elliptic curve E
j(E)j-invariant of elliptic curve E
E(K)group of points on elliptic curve E defined over field K
P + Qsum of two points P,
mPm-th multiple (that is, m-fold sum) of point
ψm, , fmm-th division polynomials
ttrace of Frobenius of elliptic curve
EK[m]group of m-torsion points in E(K)
E[m]abbreviation for
emWeil pairing (a map E[m] × E[m] → μm)
Div(a, b)representation of reduced divisor on hyperelliptic curve by polynomials a, b
Probability and statistics
Pr(E)probability of event E
Pr(E1|E2)conditional probability of event E1 given event E2
E(X)expectation of random variable X
Var(X)variance of random variable X
σXstandard deviation of random variable X (equals )
Cov(X, Y)covariance of random variables X, Y
ρX,Ycorrelation coefficient of random variables X, Y
Computational complexity
f = O(g)big-Oh notation: f is of the order of g
f = Ω(g)big-Omega notation: g is of the order of f
f = Θ(g)big-Theta notation: f and g have the same order
f = o(g)small-oh notation: f is of strictly smaller order than g
f = ω(g)small-omega notation: f is of strictly larger order than g
f = O~(g)soft-Oh notation: f = O(g logk g) for real constant k ≥ 0
problem P1 is polynomial-time reducible to problem P2
P1P2problems P1 and P2 are polynomial-time equivalent
Intractable problems
CVPclosest vector problem
DHP(finite field) Diffie–Hellman problem
DLP(finite field) discrete logarithm problem
ECDHPelliptic curve Diffie–Hellman problem
ECDLPelliptic curve discrete logarithm problem
HECDHPhyperelliptic curve Diffie–Hellman problem
HECDLPhyperelliptic curve discrete logarithm problem
GIFPgeneral integer factorization problem
IFPinteger factorization problem
QRPquadratic residuosity problem
RSAIFPRSA integer factorization problem
RSAKIPRSA key inversion problem
RSAPRSA problem
SQRTPmodular square root problem
SSPsubset sum problem
SVPshortest vector problem
Algorithms
ADHAdleman, DeMarrais and Huang’s algorithm
AESadvanced encryption standard
AKSAgarwal, Kayal and Saxena’s deterministic primality test
BSGSShanks’ baby-step–giant-step method
CBCcipher-block chaining mode
CFBcipher feedback mode
CSMcubic sieve method
CSPRBGcryptographically strong pseudorandom bit generator
CvAChaum and Van Antwerpen’s undeniable signature scheme
DDFdistinct-degree factorization
DESdata encryption standard
DHDiffie–Hellman key exchange
DPAdifferential power analysis
DSAdigital signature algorithm
DSSdigital signature standard
ECBelectronic codebook mode
ECDSAelliptic curve digital signature algorithm
ECMelliptic curve method
E-D-Eencryption–decryption–encryption scheme of triple encryption
EDFequal-degree factorization
EGEschenauer and Gligor’s scheme
FEALfast data encipherment algorithm
FFSFeige, Fiat and Shamir’s zero-knowledge protocol
GKRGennaro, Krawczyk and Rabin’s RSA-based undeniable signature scheme
GNFSMgeneral number field sieve method
GQGuillou and Quisquater’s zero-knowledge protocol
HFEcryptosystem based on hidden field equations
ICMindex calculus method
IDEAinternational data encryption algorithm
KLCHKPbraid group cryptosystem
L3Lenstra–Lenstra–Lovasz algorithm
LFSRlinear feedback shift register
LSMlinear sieve method
LUCcryptosystem based on Lucas sequences
MOVMenezes, Okamoto and Vanstone’s reduction
MPQSMmultiple polynomial quadratic sieve method
MQVMenezes–Qu–Vanstone key exchange
NFSMnumber field sieve method
NRNyberg–Rueppel signature algorithm
NTRUHoffstein, Pipher and Silverman’s encryption algorithm
NTRUSignNTRU signature algorithm
OAEPoptimal asymmetric encryption procedure
OFBoutput feedback mode
PAPpretty awful privacy
PGPpretty good privacy
PHPohlig–Hellman method
PRBGpseudorandom bit generator
PSSprobabilistic signature scheme
QSMquadratic sieve method
RSARivest, Shamir and Adleman’s algorithm
SAFERsecure and fast encryption routine
Satoh–FGHPoint counting algorithm on elliptic curves over fields of characteristic 2
SDSAshortened digital signature algorithm
SEASchoof, Elkies and Atkins’ algorithm for point counting on elliptic curves
SETUPsecretly embedded trapdoor with universal protection
SFFsquare-free factorization
SHAsecure hash algorithm
SmartASSalgorithm for computing discrete logs in anomalous elliptic curves
SNFSMspecial number field sieve method
SPAsimple power analysis
TWINKLEthe Weizmann Institute key location engine
TWIRLthe Weizmann Institute relation locator
XCMxedni calculus method
XSLextended sparse linearization attack
XTRefficient and compact subgroup trace representation
ZKzero-knowledge
Quantum computation
|ψ〉ket notation for vector ψ
inner product of vectors |ψ〉 and
‖ψ‖norm of vector |ψ〉 (equals )
n-dimensional Hilbert space (over )
|0〉, |1〉, . . . , |n – 1〉orthonormal basis of
cbitclassical bit
qubitquantum bit
tensor product of Hilbert spaces
FFourier transform
HHadamard transform
IIdentity transform
XExchange transform
ZZ transform
Computational primitives
ulong32-bit unsigned integer data type (unsigned long)
ullong64-bit unsigned integer data type (unsigned long long)
a := bassignment operator (returns the value assigned)
+, –, ×, /, %arithmetic operators
++, – –increment and decrement operators
a ◊ = ba := ab for
=, ≠, >, <, ≥, ≤comparison operators
1True as a condition
ifconditional statement: if (condition)···
if-elseconditional statement: if (condition)··· , else···
whilewhile loop: while (condition)···
dodo loop: do···while (condition)
forfor loop: for (range of values)···
{···}block of statements
, or. or new-linestatement terminator
/*··· */comment
returnreturn from this routine
Miscellaneous
end of (visible or invisible) proof
end of item (like example, definition, assumption)
[H]hint available in Appendix D

1. Overview

1.1Introduction
1.2Common Cryptographic Primitives
1.3Public-key Cryptography
1.4Some Cryptographic Terms
 Chapter Summary

Aller Anfang ist schwer: All beginnings are difficult.

—German proverb

Defendit numerus: There is safety in numbers.

—Anonymous

The ability to quote is a serviceable substitute for wit.

—W. Somerset Maugham

1.1. Introduction

It is rather difficult to give a precise definition of cryptography. Loosely speaking, it is the science (or art or technology) of preventing access to sensitive data by parties who are not authorized to access the data. Secure transmission of messages over a public channel is the first, simplest and oldest example of a cryptographic protocol. For assessing the security of these protocols, one studies their possible weak points, namely the strategies for breaking them. This study is commonly referred to as cryptanalysis. And, finally, the study of both cryptography and cryptanalysis is known as cryptology.

Cryptology = Cryptography + Cryptanalysis

The science of cryptology is rather old. It naturally developed as and when human beings felt the need for privacy and secrecy. The rapid deployment of the Internet in the current years demands that we look into this subject with a renewed interest. Newer requirements tailored to Internet applications have started cropping up and as a result newer methods, protocols and algorithms are coming up. The most startling discoveries include that of the key-exchange protocol by Diffie and Hellman in 1976 and that of the RSA cryptosystem by Rivest, Shamir and Adleman in 1978. They opened up a new branch of cryptology, namely public-key cryptology. Historically, public-key technology came earlier than the Internet, but it is the latter that makes an extensive use of the former.

This book is an attempt to introduce to the reader the vast and interesting branch of public-key cryptology. One of the most distinguishing features of public-key cryptology is that it involves a reasonable amount of abstract mathematics which often comes in the way of a complete understanding to an uninitiated reader. This book tries to bridge the gap. We develop the required mathematics in necessary and sufficient details.

This chapter is an overview of the topics that the rest of the book deals with. We start with a description of the most common cryptographic protocols. Then we introduce the public-key paradigm and discuss the source of its security. We use certain mathematical terms and notations throughout this chapter. If the reader is not already familiar with these terms, there is nothing to worry about. As we have just claimed, we will introduce the mathematics in the later chapters. The exposition of this chapter is expected to give the reader an overview of the area of public-key cryptography and also the requisite motivation for learning the mathematical tools that follow.

1.2. Common Cryptographic Primitives

As claimed at the outset of this chapter, it is rather difficult to give a precise definition of the term cryptography. The best way to understand it is by examples. In this section, we briefly describe the common problems that cryptography deals with.

1.2.1. The Classical Problem: Secure Transmission of Messages

To start with, we introduce the legendary figures of cryptography: Alice, Bob and Carol. Alice wants to send a message to Bob over a public communication channel like the Internet and wants to ensure that nobody other than Bob can make out the meaning of the message. A third party like Carol, who has access to the communication channel, can intercept the message. But the message should be wrapped or transformed before transmission in such a way that knowledge of some secret piece of information is needed to unwrap or transform back the message. It is Bob who has this information, but not Carol (nor Dorothy nor Emily nor . . .).

It is expedient to point out here that Alice, Bob and Carol need not be human beings. They can stand for organizations (like banks) or, more correctly, for computers or computer programs run by individuals or organizations. It is, therefore, customary to call them parties, entities or subjects instead of persons or characters. In the cryptology jargon, Carol has got several names used interchangeably: adversary, eavesdropper, opponent, intruder, attacker and enemy are the most common ones. When a message transmission like that just mentioned is involved, Alice is called the sender and Bob is called the receiver of the message.

It is a natural strategy to put the message in a box and lock the box using a key, called the encryption key. A matching decryption key is needed to unlock the box and retrieve the message. The process of putting the message in the box is commonly called encoding and that of locking the box is called encryption. The reverse processes, namely unlocking the box and taking the message out of the box are respectively called decryption and decoding. This is precisely the classical encryption–decryption protocol of cryptography.[1]

[1] Some people prefer to use the terms enciphering and deciphering in place of the words encryption and decryption respectively.

In the world of electronic communication, a message M is usually a bit string, and encoding, encryption, decryption and decoding are well-defined transformations of bit strings. If we denote by fe the transformation function consisting of encoding and encryption, then we get a new bit string C = fe(M, Ke), where Ke stands for the encryption key. This bit string C is sent over the communication channel. After Bob receives C, he uses the reverse transformation fd (decryption followed by decoding) to get the original message M back; that is, M = fd(C, Kd). Note that the decryption key Kd is needed as an argument to fd. If Carol does not know this, she cannot compute M. We conventionally call M the plaintext message and C the ciphertext message.

The encoding and decoding operations do not make use of keys and can be performed by anybody. (It should not be difficult to put a letter in or take a letter out of an unlocked box!) One might then wonder why it is necessary to do these transformations instead of applying the encryption and decryption operations directly on M and C respectively. With whatever we have discussed so far, we cannot give a full answer to this question. For the answer, we will need to wait until we reach the later chapters. We only mention here that the encryption algorithms often require as input some mathematical entities (like integers or elements of a field) which are logically not bit strings. But that’s not all! As we see later, the additional transformations often add to the security of the protocols. On the other hand, for a general discussion, it is often unnecessary to start from the encoding process and end at the decoding process. As a result, we will assume, unless otherwise stated, that M is the input to the encryption routine and the output of the decryption routine, in which case fe and fd stand for the encryption and decryption functions only.

Symmetric-key or secret-key cryptography

In the simplest form of locking mechanism, one has Ke = Kd. That is, the same key, called the symmetric key or the secret key, is used for both encryption and decryption. Common examples of such symmetric-key algorithms include DES (Data Encryption Standard) together with its various modifications like the Triple DES and DES-X, IDEA (International Data Encryption Algorithm), SAFER (Secure And Fast Encryption Routine), FEAL (Fast Encryption Algorithm), Blowfish, RC5 and AES (Advanced Encryption Standard). We will not describe all these algorithms in this book. Interested readers can look at the abundant literature to know more about them.

Asymmetric-key or public-key cryptography

The biggest disadvantage of using a secret-key system is that Alice and Bob must agree upon the key Ke = Kd secretly, for example by personal contact or over a secure channel. This is a serious limitation and is not often practical nor even possible. Another drawback of secret-key systems is that every pair of parties needs a key for communication. Thus, if there are n entities communicating over a net, the number of keys would be of the order of n2. Also, each entity has to remember O(n) keys for communicating with other entities. In practice, however, an entity does not communicate with every other entity on the net. Yet the total number of keys to be remembered by an entity could be quite high.

Both these problems can be avoided by using what is called an asymmetric-key or a public-key protocol. In such a protocol, each entity decides a key pair (Ke, Kd), makes the encryption key Ke public and keeps the decryption key Kd secret. Ke is also called the public key and Kd the private key. Anybody who wants to send a message to Bob gets Bob’s public key, encrypts the message with the key, and sends the ciphertext to Bob. Upon receiving the ciphertext, Bob uses his private key to decrypt the message. One may view such a lock as a self-locking padlock. Anybody can lock a box with a self-locking padlock, but opening it requires a key which only Bob possesses.

The source of security of such a system is based on the difficulty of computing the private key Kd given the public key Ke. It is apparent that Ke and Kd are sort of inverses of each other, because the former is used to generate C from M and the latter is used to generate M from C. This is where mathematics comes into the picture. We mention a few possible constructions of key pairs in the next section and the rest of the book deals with an in-depth study of these public-key protocols.

Attractive as it looks, public-key protocols have a serious drawback, namely they are orders of magnitude slower than their secret-key counterparts. This is of concern, if huge amounts of data need to be encrypted and decrypted. This shortcoming can be overcome by using both secret-key and public-key protocols in tandem as follows: Alice generates a secret key (say, for AES), encrypts the message by the secret key and the secret key by the public key of Bob and sends both the encrypted message and the encrypted secret key. Bob first decrypts the encrypted secret key using his private key and uses this decrypted secret key to decrypt the message. Since secret keys are usually short bit strings (most commonly of length 128 bits), the slow performance of the public-key algorithms causes little trouble. But at the same time, Alice and Bob are relieved of having a previous secret meeting or communication for agreeing on the secret key. Moreover, neither Alice nor Bob needs to remember the secret key. During every session of message transmission, a random secret key can be generated and later destroyed, when the communication is over.

1.2.2. Key Exchange

There is an alternative method by which Alice and Bob can exchange secret information (like AES keys) over a public communication channel. Let us first see how this can be done in the physical lock-and-key scenario. Alice generates a secret, puts it in a box, locks the box with her own key and sends it to Bob. Bob, upon receiving the locked box, adds a second lock to it and sends the doubly locked box back to Alice. Alice then removes her lock and again sends the box to Bob. Finally, Bob uses his key to unlock the box and retrieve the secret. A third party (Carol) that can access the box during the three communications finds it locked by Alice or Bob or both. Since Carol does not possess the keys to these locks, she cannot open the box to discover the secret.

This process can be abstractly described as follows: Alice and Bob first independently generate key pairs (AKe, AKd) and (BKe, BKd) respectively. Alice then sends AKe to Bob and Bob sends BKe to Alice. The private keys AKd and BKd are not disclosed. They also agree upon a function g with which Alice computes gA = g(AKd, BKe) and Bob computes gB = g(BKd, AKe). If gA = gB, then this common value can be used as a shared secret between Alice and Bob.

Our intruder Carol knows g and taps the values of AKe and BKe. So the function g should be such that a knowledge of these values alone does not suffice for the computation of gA = gB. One of the private keys AKd or BKd is needed for the computation. Since (AKe, AKd) and (BKe, BKd) are key pairs, it is assumed that private keys are difficult to compute from the knowledge of the corresponding public keys.

Such a technique of exchanging secret values over an insecure channel is called a key-exchange or a key-agreement protocol. It is important to point out here that such a protocol is usually based on the public-key paradigm; that is to say, we do not know secret-key counterparts for a key-exchange protocol. Since a shared secret between the communicating parties is usually short, the low speed of public-key algorithms is really not a concern in this case.

1.2.3. Digital Signatures

A digital signature is yet another application of the public-key paradigm. Suppose Alice wants to sign a message M in such a way that the signature S can be verified by anybody but nobody other than Alice would be able to generate the signature S on the message M. This can be achieved as follows: Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. She now uses the decryption function fd to generate the signature, that is, S = fd(M, Kd). The signature S is then made public. Anybody who has access to Alice’s public key Ke applies the reverse transformation fe to get back the message M = fe(S, Ke).

If Carol signs the message M with a different key , then she generates the signature S′ = fd(M, ). Now, since and Ke are not matching keys, verification using Ke gives M′ = fe(S′, Ke), which is different from M. If we assume that M is a message written in a human-readable language (like English), then M′ would generally look like a meaningless sequence of characters which is neither English nor any sensible string to a human reader. So the signature verifier would then immediately conclude that this is a case of forged signature.

Such a scheme of generating digital signatures is called a signature scheme with message recovery. It is obvious that this is the same as our encrypt–decrypt scheme with the sequence of encryption and decryption steps reversed. If the message M to be signed is quite long, using this algorithm calls for a large execution time both for signature generation and for verification. It is, therefore, customary to use another variant of signature schemes called signature schemes with appendix that we describe now.

Instead of applying the decryption transform directly on M, Alice first computes a short representative H(M) of her message M. Her signature now becomes the pair S = (M, σ), where σ = fd(H(M), Kd). Typically, a hash function (see Section 1.2.6) is used to compute the representative H(M) from M and is assumed to be a public knowledge. Now anybody can verify the signature by checking if the equality H(M) = fe(σ, Ke) holds. If a key different from Kd is used to generate the signature, one would (in general) get a value σ′ ≠ σ and the signature forging will be detected by observing that H(M) ≠ fe(σ′, Ke).

1.2.4. Entity Authentication

By entity authentication, we mean a process in which one entity called the claimant proves its identity to another entity called the prover. Entity-authentication techniques, thus, tend to prevent impersonation of an entity by an intruder. Both secret-key and public-key techniques are used for entity-authentication schemes.

The simplest example of an entity-authentication scheme is the use of passwords, as in a computer where a user (the claimant) tries to gain access to some resources in a computer (the prover) by proving its identity using a password. Password schemes are mostly based on secret-key techniques. For example, the UNIX password system is based on encrypting the zero message (a string of 64 zero bits) using a repeated application of a variant of the DES algorithm with 64 bits of the user input (the password) as the key. Password-based authentication schemes are fixed and time-invariant and are often called weak authentication schemes.

We see applications of public-key techniques in challenge–response authentication schemes (also called strong authentication schemes). Assume that an entity, Alice, wants to prove her identity to another entity, Bob. Alice generates a key pair (Ke, Kd), makes Ke public and keeps Kd secret. Now, Bob chooses a random message M, encrypts M using Alice’s public key—that is, computes C = fe(M, Ke)—and sends C to Alice. Alice, upon reception of C, decrypts it using her private key Kd; that is, she regenerates M = fd(C, Kd) and sends M to Bob. Bob compares this value of M with the one he generated, and if a match occurs, Bob becomes sure that the entity who is claiming to be Alice possesses the knowledge of Alice’s private key. If Carol uses any private key other than Kd for the decryption, she gets a message M′ different from M and thereby cannot prove to Bob her identity as Alice. This is how this scheme prevents impersonation of Alice by Carol.

Entity authentication is often carried out using another interesting technique called zero-knowledge proof. In such a protocol, the prover (or any third party listening to the conversation) gains no knowledge regarding the secret possessed by the claimant, but develops the desired confidence regarding the claim by the claimant of the possession of the secret. We provide here an informal example explaining zero-knowledge proofs.

Let us think of a circular cave as shown in Figure 1.1. The cave has two exits, left and right, denoted by L and R respectively. The cave also has a door inside it, which is invisible outside the cave. Alice (A) wants to prove to Bob (B) that she possesses a key to this door without showing him the key or the process of unlocking the door with the key. Bob stations himself somewhere outside the exits of the cave. Alice enters the cave and randomly chooses the left or right wing of the cave (and goes there). She does not disclose this choice to Bob, because Bob is not allowed to know the session secrets too. Once Alice is placed in the cave, Bob makes a random choice from L and R and asks Alice (using cell phones or by shouting loudly) to come out of the cave via that chosen exit. Suppose Bob challenges Alice to use L. If Alice is in the left wing, she can come out of the cave using L. If Alice is in the right wing, she must use her secret key to open the central door to come to the left wing and then go out using exit L. If Alice does not possess the secret key, she can succeed in obeying Bob’s directive with a probability of half. If this procedure is repeated t times, then the probability that Alice succeeds on all occasions without possessing the secret key is (1/2)t = 1/2t. By choosing t appropriately, Bob can make the probability of accepting a false claim arbitrarily small. For example, if t = 20, then the chance is less than one in a million that Alice can establish a false claim.

Figure 1.1. Zero-knowledge proofs


Thus, if Alice succeeds every time, Bob gains the desired confidence that Alice actually possesses the secret. However, during this entire process, Bob can obtain no information regarding Alice’s secrets (the key and the choices of wings). Another important aspect of this interaction is that Alice has no way of predicting Bob’s questions, preventing impostors (of Alice) from fooling Bob.

1.2.5. Secret Sharing

Suppose that a secret piece of information is to be distributed among n entities in such a way that n – 1 (or fewer) entities are unable to construct the secret. All of the n entities must participate to reveal the secret. As usual, let us assume that the secret is an l-bit string. A simple strategy would be to break the string into n parts and provide each entity with a part. This method is, however, not really attractive, because it gives partial information about the secret. Thus, for example, if a 256-bit long bit string is to be distributed equally among 16 entities, any 15 of them working together can reconstruct the secret by trying only 216 = 65536 possibilities for the unknown 16 bits.

We now describe an alternative strategy that does not suffer from this drawback. Once again, we break the secret string into n parts and consider the parts as integers a0, . . . , an–1. We construct the polynomial f(x) = xn+an–1xn–1 + · · · + a1x+a0 and give the integers f(1), f(2), . . . , f(n) to the entities. When all of the entities cooperate, the linear system of equations f(i) = in + an–1in–1 + · · · + a1i + a0, 1 ≤ in, can be solved to find out the unknown coefficients a0, . . . , an–1 which, in turn, reveal the secret. On the other hand, if n – 1 or less entities cooperate, they get an underspecified system of equations in n unknowns, from which the actual solution is not readily available.

The secret-sharing problem can be generalized in the following way: to distribute a secret among n parties in such a way that any m or more of the parties can reconstruct the secret (for some mn), whereas any m – 1 or less parties cannot do the same. A polynomial of degree m as in the above example readily adapts to this generalized situation.

1.2.6. Hashing

A function which converts bit strings of arbitrary lengths to bit strings of a fixed (finite) length is called a hash function. Hash functions play a crucial role in cryptography. We have already seen an application of it for designing a digital signature scheme with appendix. If H is a hash function, a pair of input values (strings) x1 and x2 for which H(x1) = H(x2) is called a collision for H. For any hash function H, collisions must exist, since H is a map from an infinite set to a finite set. However, for cryptographic purposes we want that collisions should be difficult to obtain. More specifically, a cryptographic hash function H should satisfy the following desirable properties:

First pre-image resistance

Except for a small set of hash values y it should be difficult to find an input x with H(x) = y. We exclude a small set of values, because an adversary might prepare (and maintain) a list of pairs (x, H(x)) for certain values of x of her choice. If the given value of y is the second coordinate of one pair in her list, she can produce the corresponding input value x easily.

Second pre-image resistance

Given a pair (x, H(x)), it should be difficult to find an input x′ different from x with H(x) = H(x′).

Collision resistance

It should be difficult to find two different input strings x, x′ with H(x) = H(x′).

Hash functions are also called message digests and can be used with a secret key. Popular examples of unkeyed hash functions are SHA-1, MD5 and MD2, whereas those for keyed hash functions include HMAC and CBCMAC.

1.2.7. Certification

So far we have seen several protocols which are based on the use of public keys of remote entities, but have never questioned the authenticity of public keys. In other words, it is necessary to ascertain that a public key is really owned by a remote entity. Public-key certificates are used to that effect. These are data structures that bind public-key values to entities. This binding is achieved by having a trusted certification authority digitally sign each certificate.

Typically a certificate is issued for a period of validity. However, it is possible that a certificate becomes invalid before its date of expiry for several reasons, like possible or suspected compromise of the private key. Under such circumstances it is necessary that the certification authority revokes the certificate and maintains a list called certificate revocation list (CRL) of revoked certificates. When Alice verifies the authenticity of Bob’s public-key certificate by verifying the digital signature of the authority and does not find the certificate in the CRL, she gains the desired confidence in using Bob’s public key.

The X 5.09 public-key infrastructure specifies Internet standards for certificates and CRLs.

1.3. Public-key Cryptography

In this section, we give a short introduction to the realization of public-key cryptosystems. More specifically, we list some of the computationally intensive mathematical problems and describe how the (apparent) intractability of these problems can be used for designing key pairs. We use some mathematical terms that we will introduce later in this book.

1.3.1. The Mathematical Problems

The security of the public-key cryptosystems is based on the presumed difficulty of solving certain mathematical problems.

The integer factorization problem (IFP)

Given the product n = pq of two distinct prime integers p and q, find p and q.

The discrete logarithm problem (DLP)

Let G be a finite cyclic (multiplicatively written) group with cardinality n and a generator g. Given an element , find an integer x (or the integer x with 0 ≤ xn – 1) such that a = gx in G. Three different types of groups are commonly used for cryptographic applications: the multiplicative group of a finite field, the group of rational points on an elliptic curve over a finite field and the Jacobian of a hyperelliptic curve over a finite field. By an abuse of notation, we often denote the DLP over finite fields as simply DLP, whereas the DLP in elliptic curves and hyper-elliptic curves is referred to as the elliptic curve discrete logarithm problem (ECDLP) and the hyperelliptic curve discrete logarithm problem (HECDLP).

The Diffie–Hellman problem (DHP)

Let G and g be as above. Given elements ga and gb of G, compute the element gab. As in the case of the DLP, the DHP can be applied to the multiplicative group of finite fields, the group of rational points on an elliptic curve and the Jacobian of a hyperelliptic curve.

We show in the next section how (the intractability of) these problems can be exploited to create key pairs for various cryptosystems. These computational problems are termed difficult, intractable, infeasible or intensive in the sense that there are no known algorithms to solve these problems in time polynomially bounded by the input size. The best-known algorithms are subexponential or even fully exponential in some cases. This means that if the input size is chosen to be sufficiently large, then it is infeasible to compute the private key from a knowledge of the public key in a reasonable amount of time. This, in turn, implies (not provably, but as the current state of the art stands) that encryption or signature verification can be done rather quickly (in polynomial time), but the converse process of decryption or signature generation cannot be done in feasible time, unless one knows the private key. As a result, encryption (or signature verification) is called a trapdoor one-way function, that is, a function which is easy to compute but for which the inverse is computationally infeasible, unless some additional information (the trapdoor) is available.

It is, however, not known that these problems are really computationally infeasible, that is, there is no proof of the fact that these problems cannot be solved in polynomial time. As a result, the public-key cryptographic systems based on these problems are not provably secure.

1.3.2. Realization of Key Pairs

In RSA and similar cryptosystems, one generates two (distinct) suitably large primes p and q and computes the product n = pq. Then φ(n) = (p – 1)(q – 1), where φ denotes Euler’s totient function. One then chooses a random integer e with gcd(e, φ(n)) = 1. There exists an integer d such that ed ≡ 1 (mod φ(n)). The integer e is used as the public key, whereas the integer d is used as the private key.

If the IFP can be solved fast, one can also compute φ(n) easily, and subsequently d can be computed from e using the (polynomial-time) extended GCD algorithm. This is why[2] we say that the RSA cryptosystem derives its security from the intractability of the IFP.

[2] The problem of factoring n = pq is polynomial-time equivalent to computing φ(n) = (p – 1)(q – 1).

In order to see how RSA encryption and decryption work, let the plaintext message be encoded as an integer m with 2 ≤ m < n. The ciphertext message is generated (as an integer) as c = me (mod n). Decryption is analogous, that is, m = cd (mod n). The correctness of the algorithm follows from the fact that ed ≡ 1 (mod φ(n)). It is, however, not proved that one has to know d or φ(n) or the factorization of n in order to decrypt an RSA-encrypted message. But at present no better methods are known.

Let us now consider the discrete logarithm problem. Let G be a finite cyclic multiplicative group (as those mentioned above) where it is easy to multiply two elements, but where it is difficult to compute discrete logarithms. Let g be a generator of G. In order to set up a random key pair over such a group, one chooses the private key as a random integer d, 2 ≤ d < n, where n is the cardinality of G. The public key e is then computed as an element of G as e = gd.

Applications of encryption–decryption schemes based on the key pair (gd, d) are given in Chapter 5. Now, we only remark that many such schemes (like the ElGamal scheme) derive their security from the DHP instead of the DLP, whereas the other schemes (like the Nyberg–Rueppel scheme) do so from the DLP. It is assumed that these two problems are computationally equivalent (at least for the groups of our interest). Obviously, if one assumes availability of a solution of the DLP, one has a solution for the DHP too (gab = (ga)b). The reverse implication is not clear.

1.3.3. Public-key Cryptanalysis

As we pointed out earlier, (most of) the public-key cryptosystems are not provably secure in the sense that they are based on the apparent difficulty of solving certain computational problems. It is expedient to know how difficult these problems are. No non-trivial complexity–theoretic statements are available for these problems, and as such it is worthwhile to study the algorithms known till date to solve these problems. Unfortunately, however, many of the algorithms of this kind are often much more complicated than the algorithms for building the corresponding cryptographic systems. One needs to acquire more mathematical machinery in order to understand (and augment) these cryptanalytic algorithms. We devote Chapter 4 to a detailed discussion on these algorithms.

In specific situations, one need not always use these computationally intensive algorithms. Access to a party’s decryption equipment may allow an adversary to gain partial or complete information about the private key by watching a decryption process. For example, an adversary (say, the superuser) might have the capability to read the contents of the memory holding a private key during some decryption process. For another possibility, think of RSA decryption which involves a modular exponentiation. If the standard square-and-multiply algorithm (Algorithm 3.9) is used for this purpose and the adversary can tap some hardware details (like machine cycles or power fluctuations) during a decryption process, she can guess a significant number of the bits in the private key. Such attacks, often called side-channel attacks, are particularly relevant for cryptographic applications based on smart cards.

A cryptographic system is (believed to be) strong if and only if there are no good known mechanisms to break it. It is, therefore, for the sake of security that we must study cryptanalysis. Cryptography and cryptanalysis are deeply intertwined and a complete study of one must involve the other.

1.4. Some Cryptographic Terms

In cryptology, there are different models of attacks or attackers.

1.4.1. Models of Attacks

So far we have assumed that an adversary can only read messages during transmission over a channel. Such an adversary is called a passive adversary. An active adversary, on the other hand, can mutilate or delete messages during transmission and/or generate false messages. An attack mounted by an active (resp.[3] a passive) adversary is called an active (resp. a passive) attack. In this book, we will mostly concentrate on passive attacks.

[3] Throughout the book, resp. stands for respectively.

1.4.2. Models of Passive Attacks

A two-party communication involves transmission of ciphertext messages over a communication channel. A passive attacker can read these ciphertext messages. In practice, however, an attacker might have more control over the choice of ciphertext and/or plaintext messages. Based on these capabilities of the attacker we have the following types of attacks.

Ciphertext-only attack

This is the weakest model of the adversary. Here the attacker has absolutely no choices on the ciphertext messages that flow in the channel and also on the corresponding plaintext messages. Using only these ciphertext messages the attacker has to obtain a private key and/or a plaintext message corresponding to a new ciphertext message.

Known-pair attack

In this kind of attack (also called known-plaintext or known-ciphertext attack), the attacker uses her knowledge of some plaintext–ciphertext pairs. If many such pairs are available to the attacker, she can use these pairs to deduce a pattern based on which she can subsequently gain some information on a new plaintext for which the ciphertext is available. In a public-key scheme, the adversary can generate as many such pairs as she wants, because in order to generate such a pair it is sufficient to have a knowledge of the receiver’s public key. Thus a public-key encryption scheme must provide sufficient security against known plaintext attacks.

Chosen-plaintext attack

In this kind of attack, the attacker knows some plaintext–ciphertext pairs in which the plaintexts are chosen by the attacker. As discussed earlier, such an attack is easily mountable for a public-key encryption scheme.

Adaptive chosen-plaintext attack

This is similar to the chosen-plaintext attack with the additional possibility that the attacker chooses the plaintexts in the known plaintext–ciphertext pairs sequentially and adaptively based on the knowledge of the previous pairs. This kind of attack can be easily mounted on public-key encryption systems.

Chosen-ciphertext attack

The attacker has knowledge of some plaintext–ciphertext pairs in which the ciphertexts are chosen by the attacker. Such an attack is not directly mountable on a public-key scheme, since obtaining a plaintext from a chosen ciphertext requires knowledge of the private key. However, if the attacker has access to the receiver’s decryption equipment, the machine can divulge the plaintexts corresponding to the ciphertexts that the attacker supplies to the machine. In this context, we assume that the machine does not reveal the private key itself, that is, it has the key stored secretly somewhere in its hardware which the attacker cannot directly access. However, the attacker can run the machine to know the plaintexts corresponding to the ciphertexts of her choice. Later (when the attacker no longer has access to the decryption equipment) the known pairs may be exploited to obtain information about the plaintext corresponding to a new ciphertext.

Adaptive chosen-ciphertext attack

This is similar to the chosen-ciphertext attack with the additional possibility that the attacker chooses the ciphertexts in the known pairs sequentially and adaptively based on her knowledge of the previously generated plaintext–ciphertext pairs. This attack is mountable in a scenario described in connection with chosen-ciphertext attacks.

For a digital signature scheme, there are equivalent names for these types of attacks. The attacker is assumed to have access to the public key of the signer, because this key is used for signature verification. An attempt to forge signatures based only on the knowledge of this verification key is called a key-only attack. The adversary may additionally possess knowledge of some message–signature pairs. An attack based on this knowledge is called a known-pair or known-message or known-signature attack. If the messages are chosen by the adversary, we call the attack a chosen-message attack. If the adversary generates the sequence of messages in a chosen-message attack adaptively (based on the previously generated message–signature pairs), we have an adaptive chosen-message attack. An (adaptive or non-adaptive) chosen-message attack can be mounted, if the attacker gains access to the signer’s signature generation equipment, or if the signer is willing to sign arbitrary messages provided by the adversary.

The attacker can choose some signatures and generate the corresponding messages by encrypting them with the signer’s public key. The private-key operation on these messages generates the signatures chosen by the attacker. This gives chosen-signature and adaptive chosen-signature attacks on a digital signature scheme. Now the adversary cannot directly control the messages to sign. On the other hand, such an attack is easily mountable, because it utilizes only some public knowledge (the signer’s public key). Indeed, one may treat chosen-signature attacks as variants of key-only attacks.

1.4.3. Public Versus Private Algorithms

So far, we have assumed that all the parties connected to a network know the algorithms used in a cryptographic scheme. The security of the scheme is based on the difficulty of obtaining some secret information (the secret or private key).

It, however, remains possible that two parties communicate using an algorithm unknown to other entities. Top-secret communications (for example, during wars or diplomatic transactions) often use private cryptographic algorithms. In this book, we will not deal with such techniques. Our attention is focused mostly on Internet applications in which public knowledge of the algorithms is of paramount importance (for the sake of universal applicability and convenience).

In short, this book is going to deal with a world in which only public public-key algorithms are deployed and in which adversaries are usually passive. A restricted model of the world though it may be, it is general and useful enough to concentrate on. Let us begin our journey!

Chapter Summary

This chapter provides an overview of the problems that cryptology deals with. The first and oldest cryptographic primitive is encryption for secure transmission of messages. Some other primitives are key exchange, digital signature, authentication, secret sharing, hashing, and digital certificates. We then highlight the difference between symmetric (secret-key) and asymmetric (public-key) cryptography. The relevance of some computationally intractable mathematical problems in public-key cryptography is discussed next, and the working of a prototype public-key cryptosystem (RSA) is explained. We finally discuss different models of attacks on cryptosystems.

Not uncommonly, some people think that cryptology also deals with intrusion, viruses, and Trojan horses. We emphasize that this is never the case. Data and network security is the branch that deals with these topics. Cryptography is also a part of this branch, but not conversely. Imagine that your house is to be secured against theft. First, you need a good lock—that is, cryptography. However, a lock has nothing to prevent a thief from entering the house after breaking the window panes. A bad butler who leaks secret information of the house to the outside world also does not come under the jurisdiction of the lock. Securing your house requires adopting sufficient guards against all these possibilities of theft. In this book, we will study only the technology of manufacturing and breaking locks.

2. Mathematical Concepts

2.1Introduction
2.2Sets, Relations and Functions
2.3Groups
2.4Rings
2.5Integers
2.6Polynomials
2.7Vector Spaces and Modules
2.8Fields
2.9Finite Fields
2.10Affine and Projective Curves
2.11Elliptic Curves
2.12Hyperelliptic Curves
2.13Number Fields
2.14p-adic Numbers
2.15Statistical Methods
 Chapter Summary
 Sugestions for Further Reading

Young man, in mathematics you don’t understand things, you just get used to them.

—John von Neumann

Mathematics contains much that will neither hurt one if one does not know it nor help one if one does know it.

—J. B. Mencken

Mathematics is the Queen of Science but she isn’t very pure; she keeps having babies by handsome young upstarts and various frog princes.

—Donald Kingsbury

2.1. Introduction

In this chapter, we introduce the basic mathematical concepts that one should know in order to understand the public-key cryptographic protocols and the corresponding cryptanalytic algorithms described in the later chapters. If the reader is already familiar with these concepts, she may quickly browse through the chapter in order to know about our notations and conventions.

This chapter is meant for cryptology students and as such does not describe the mathematical topics in their full generality. It is our intention only to state (and, if possible, prove) the relevant results that would be useful for the rest of the book. For further study, we urge the reader to consult the books suggested at the end of this chapter.

2.2. Sets, Relations and Functions

Sets are absolutely basic entities used throughout the present-day study of mathematics. Unfortunately, however, we cannot define sets. Loosely speaking, a set is an (unordered) collection of objects. But we run into difficulty with this definition for collections that are too big. Of course, infinite sets like the set of all integers or real numbers are not too big. However, a collection of all sets is too big to be called a set. (Also see Exercise 2.6.) It is, therefore, customary to have an axiomatic definition of sets. That is to say, a collection qualifies to be a set if it satisfies certain axioms. We do not go into the details of this axiomatic definition, but tell the axioms as properties of sets. Luckily enough, we won’t have a chance in the rest of this book to deal with collections that are not sets. So the reader can, for the time being, have faith in the above (wrong) identification of a set as a collection.

An object in a set is commonly called an element of A. By the notation , we mean that a is an element of the set A. Often a set A can be represented explicitly by writing down its elements within curly brackets or braces. For example, A = {2, 3, 5, 7} denotes the set consisting of the elements 2, 3, 5, 7 which are incidentally all the (positive) prime numbers less than 10. We often use the ellipsis sign (. . .) to denote an infinite (or even a finite) set. For example, would denote the set of all (positive) prime numbers. (We prove later that is an infinite set.) Alternatively, we often describe a set by mentioning the properties of the elements of the set. For example, the set can also be described as .

Some frequently occurring sets are denoted by special symbols. We list a few of them here.

The set of all natural numbers, that is, {1, 2, 3, . . .}
The set of all non-negative integers, that is, {0, 1, 2, . . .}
The set of all integers, that is, {. . . , –2, –1, 0, 1, 2, . . .}
The set of all (positive) prime numbers, that is, {2, 3, 5, 7, . . .}
The set of all rational numbers, that is,
The set of all non-zero rational numbers
The set of all real numbers
The set of all non-zero real numbers
The set of all complex numbers
The set of all non-zero complex numbers
The empty set

The cardinality of a set A is the number of elements in A. We use the symbol #A to denote the cardinality of A. If #A is finite, we call A a finite set. Otherwise A is said to be infinite. The empty set has cardinality zero.

2.2.1. Set Operations

Let A and B be two sets. We say that A is a subset of B and denote this as AB, if all elements of A are in B. Two sets A and B are equal (that is, A = B) if and only if AB and BA. A is said to be a proper subset of B (denoted ), if AB and AB (that is, BA).

The union of A and B is the set whose elements are either in A or in B (or both). This set is denoted by AB. The intersection of A and B is the set consisting of elements that are common to A and B. The intersection of A and B is denoted by AB. If , then we say that A and B are disjoint. In that case, the union AB is also called a disjoint union and is referred to as by AB. (For a generalization, see Exercise 2.7.) The difference of A and B, denoted A \ B, is the set whose elements are in A but not in B. If A is understood from the context and BA, then we denote A \ B by and refer to as the complement of B (in A). The product A × B of two sets A and B is the set of all ordered pairs (a, b) where and .

The notion of union, intersection and product of sets can be readily extended to an arbitrary family of sets. Let Ai, , be a family of sets indexed by I. In this case, we denote the union and intersection of Ai, , by and respectively. The product of Ai, , is denoted by . When Ai = A for all , we denote the product also as AI. If, in addition, I is a finite set of cardinality n, then the product AI is also written as An.

2.2.2. Relations

A relation ρ on a set A is a subset of A × A. For , we usually say a ρ b implying that a is related by ρ to b. Common examples are the standard relations =, ≠, ≤, <, ≥, > on (or or ).

A relation ρ on a set A is called reflexive, if a ρ a for all . For example, =, ≤ and ≥ are reflexive relations on , but the relations ≠, <, > are not.

A relation ρ on A is called symmetric, if a ρ b implies b ρ a. On the other hand, ρ is called anti-symmetric if a ρ b and b ρ a imply a = b. For example, = is symmetric and anti-symmetric, <, ≤, > and ≥ are anti-symmetric but not symmetric, ≠ is symmetric but not anti-symmetric.

A relation ρ on A is called transitive if a ρ b and b ρ c imply a ρ c, For example, =, <, ≤, >, ≥ are all transitive, but ≠ is not transitive.

An equivalence relation is one which is reflexive, symmetric and transitive. For example, = is an equivalence relation on , but neither of the other relations mentioned above (≠, <, ≥ and so on) is an equivalence relation on .

A partition of a set A is a collection of pairwise disjoint subsets Ai, , of A, such that , that is, A is the union of Ai, , and for i, , ij, . The following theorem establishes an important connection between equivalence relations and partitions.

Theorem 2.1.

An equivalence relation on a set A produces a partition of A. Conversely, every partition of a set A corresponds to an equivalence relation on A.

Proof

Let ρ be an equivalence relation on a set A. For , let us denote . Clearly, , since (by reflexivity). Now we show that for a, , either [a] = [b] or . Assume that . Choose . By construction, a ρ c. Now choose . Then a ρ d and b ρ d. By symmetry, d ρ b, so that by transitivity a ρ b, that is, b ρ a. But a ρ c. Hence, once again by transitivity, b ρ c, that is, . Thus [a] ⊆ [b]. Similarly [b] ⊆ [a].

Conversely, let Ai, , be a partition of A. Define a relation ρ on A such that a ρ b if and only if a and b are in the same subset Ai for some i. It is easy to see that ρ is an equivalence relation on A.

The subset [a] of A defined in the proof of the above theorem is called the equivalence class of a with respect to the equivalence relation ρ.

An anti-symmetric and transitive relation is called a partial order (or simply an order). All of the relations =, ≤, <, ≥, > are partial orders on (but ≠ is not). A partial order ρ on A is called a total order or a linear order or a simple order, if for every a, , ab, either a ρ b or b ρ a. For example, if we take A = {1, 2, 3} and the relation ρ = {(1, 2), (1, 3)}, then ρ is a partial order but not a total order (because it does not specify a relation between 2 and 3). On the other hand, ρ′ = {(1, 2), (1, 3), (2, 3)} is a total order. A set with a partial (resp. total) order is often called a partially ordered (resp. totally ordered or linearly ordered or simply ordered) set.

2.2.3. Functions

Let A and B two sets (not necessarily distinct). A function or a map f from A to B, denoted f : AB, assigns to each some element . In this case, we write b = f(a) or f maps ab and say that b is the image of a (under f). For example, if , then the assignment aa2 is a function. On the other hand, the assignment (the non-negative square root) is not a function, because it is not defined for negative values of a. However, if and , then the assignment (with non-negative real and imaginary parts) is a function.

The function f : AA assigning aa for all is called the identity map on A and is usually denoted by idA. On the other hand, if f : AB maps all the elements of A to a fixed element of B, then f is said to be a constant function. A function which is not constant is called a non-constant function.

A function f : AB that maps different elements of A to different elements of B is called injective or one-one. In other words, we call f to be injective if and only if f(a) = f(a′) implies a = a′. The function given by aa2 is not injective, since f(–a) = f(a) for all . On the other hand, the function given by a ↦ 2a is injective. An injective map f : AB is sometimes denoted by the special symbol f : AB.

The image of a function f : AB is defined to be the following subset of . It is denoted by f(A) or by Im f. The function f is said to be surjective or onto or a surjection, if Im f = B, that is, every element b of B has at least one preimage (which means f(a) = b). As an example, the function given by aa/2 (if a is even) and by a ↦ (a – 1)/2 (if a is odd) is surjective, whereas the function that maps a → |a| (the absolute value) is not surjective. A surjective map f : AB is sometimes denoted by the special symbol f : AB.

A map f : AB is called bijective or a bijection, if it is both injective and surjective. For example, the identity map on a set is bijective. Another example of a bijective function is that maps a to the ath prime.

Let f : AB and g : BC be functions. The composition of f and g is the function from A to C that takes ag(f(a)). It is denoted by g ο f, that is, (g ο f)(a) = g(f(a)). Note that in the notation g ο f one applies f first and then g. The notion of composition of functions can be extended to more than two functions. In particular, if f : AB, g : BC and h : CD are functions, then (h ο g) ο f and h ο (g ο f) are the same function from A to D, so that we can unambiguously write this as h ο g ο f.

2.2.4. The Axioms of Mathematics

The study of mathematics is based on certain axioms. We state four of these axioms. It is not possible to prove the axioms independently, but it can be shown that they are equivalent in the sense that each of them can be proved, if any of the others is assumed to be true.

Let A be a partially ordered set under the relation . An element is called maximal (resp. minimal), if there is no element , ba, that satisfies (resp. ). Let B be a non-empty subset of A. Then an upper bound (resp. a lower bound) for B is an element such that (resp. ) for all . If an upper bound (resp. a lower bound) a of B is an element of B, then a is called a last element or a largest element or a maximum element (resp. a first element or a least element or a smallest element or a minimum element) of B. By antisymmetry, it follows that a first (resp. last) element of B, if existent, is unique. A chain of A is a totally ordered (under ) subset of A.

Consider the sets , and with the natural order ≤. Neither of these sets contains a maximal element. contains a minimal element 1, but and do not contain minimal elements. The subset of even natural numbers has two lower bounds, namely 1 and 2, of which 2 is the first element of .

A totally ordered set A is said to be well ordered (and the relation is called a well order), if every non-empty subset B of A contains a first element.

Axiom 2.1. Zermalo’s well-ordering principle

Every set A can be well ordered, that is, there is a relation which well orders A.

The set is well-ordered under the natural relation ≤. The set can be well ordered by the relation defined as . A well ordering of is not known.

Axiom 2.2. Zorn’s lemma

Let A be a partially ordered set. If every chain of A has an upper bound (in A), then A has at least one maximal element.

To illustrate Zorn’s lemma, consider any non-empty set A and define to be the set of all subsets of A. is called the power set of A and is partially ordered under containment ⊆. A chain of is a set of subsets of A such that for all i, either AiAj or AjAi. Clearly, the union is an upper bound of the chain. Then Zorn’s lemma guarantees that has at least one maximal element. In this case, the maximal element, namely A, is unique. If A is finite, then for the set of all proper subsets of A, a maximal element (under the partial order ⊆) exists by Zorn’s lemma, but is not unique, if #A > 1.

Axiom 2.3. Hausdorff’s maximal principle

Let be a partial order on a set A. Then there is a maximal chain B of A, that is, if C is any chain with BCA, then C = B.

Finally, let A be a set and , that is, is the set of all non-empty subsets of A. A choice function of A is a function such that for every we have .

Axiom 2.4. Axiom of choice

Every set has a choice function.

Exercise Set 2.2

2.1
  1. Let G = (V, E) be an undirected graph. Define a relation ρ on the vertex set V of G by: u ρ v if and only if there is a path from u to v. Show that ρ is an equivalence relation on V. What are the equivalence classes for this relation?

  2. Let G = (V, E) be a directed acyclic graph. Define the relation ρ on V as in (a). Show that ρ is a partial order on V. When is ρ a total order?

2.2Let f : AB and g : BA be functions. Show that if f ο g = idB, then g is injective and f is surjective. In particular, f (and also g) is bijective, if f ο g = idB and g ο f = idA. In this case, we call g to be the inverse of f and denote this as g = f–1. Show by examples that both the conditions f ο g = idB and g ο f = idA are necessary for f to be bijective.
2.3Let f : AB a map from a finite set A to a finite set B. Prove that
  1. #A ≤ #B, if f is injective,

  2. #A ≥ #B, if f is surjective, and

  3. #A = #B, if f is bijective.

2.4Let A be a finite set and let f : AA be a map. Show that the following conditions are equivalent.
  1. f is injective.

  2. f is surjective.

  3. f is bijective.

Show by examples that this equivalence need not hold, if A is an infinite set.

2.5Let A and B be two arbitrary sets, f : AB a map, A′ ⊆ A and B′ ⊆ B. We define and . Show that:
  1. If A′ ⊆ A″ ⊆ A, then f(A′) ⊆ f(A″).

  2. If B′ ⊆ B″ ⊆ B, then f–1(B′) ⊆ f–1(B″).

  3. f–1(f(A′)) ⊇ A′.

  4. f(f–1(B′)) ⊆ B′.

  5. f(f–1(f(A′))) = f(A′).

  6. f–1(f(f–1(B′))) = f–1(B′).

2.6

Russell’s paradox A collection C is called ordinary, if C is not a member of C. A collection which is not ordinary is called extraordinary. Show that the collection of all ordinary collections is neither ordinary nor extraordinary.

2.7Let Ai, , be a family of sets (not necessarily pairwise disjoint). For each , consider the set . Show that the family Bi, , are pairwise disjoint. The union is called the disjoint union of Ai, .

2.3. Groups

So far we have studied sets as unordered collections. However things start getting interesting if we define one or more binary operations on sets. Such operations define structures on sets and we compare different sets in light of their respective structures. Groups are the first (and simplest) examples of sets with binary operations.

Definition 2.1.

A binary operation on a set A is a map from A × A to A. If ◊ is a binary operation on A, it is customary to write aa′ to denote the image of (a, a′) (under ◊).

For example, addition, subtraction and multiplication are all binary operations on (or or ). Subtraction is not a binary operation on , since, for example, 2 – 3 is not an element of . Division is not a binary operation on , since division by zero is not defined. Division is a binary operation on .

2.3.1. Definition and Basic Properties

Definition 2.2.

A group[1] (G, ◊) is a set G together with a binary operation ◊ on G, that satisfy the following three conditions:

[1] In binary operations and algebras generally there is a morass of terminology which reflects on the literacy of the promulgators. Starting for example with a poor choice, namely “group”, we now have “semigroup” (why?), “loop” (why?), “groupoid”, and “partial groupoid”. . . .Among other poor choices are “ring”, “field”, “ideal”, “category theory”, and “universal algebra”. “Ideal” was used by Dedekind in a sense which made sense to mathematicians of that day but it does not today. “Field” can best be labeled as ridiculous. As to categories of category theory, the concept of category is too broad for that reduction. It is not good taste to take such a term and place it in restricted surroundings.

—Preston C. Hammer

  1. Associativity (ab) ◊ c = a ◊ (bc) for all a, b, .

  2. Identity element There exists a (unique) element such that ea = ae = a for all . The element e is called the identity of G.

  3. Inverse For each , there exists a (unique) element such that ab = ba = e. The element b is called the inverse of a.

    If, in addition, we assume that

  4. Commutativity ab = ba for all a, ,

    then G is called a commutative or an Abelian group.

A group (G, ◊) is also written in short as G, when the operation ◊ is understood from the context. More often than not, the operation ◊ is either addition (+) or multiplication (·) in which cases we also say that G is respectively an additive or a multiplicative group. For a multiplicative group, we often omit the multiplication sign and denote a · b simply as ab. The identity in an additive group is usually denoted by 0, whereas that in a multiplicative group by 1. The inverse of an element a in these cases are denoted respectively by –a and a–1. Groups written additively are usually Abelian, but groups written multiplicatively need not be so.

Note that associativity allows us to write abc unambiguously to represent (ab) ◊ c = a ◊ (bc). More generally, if , then a1 ◊ ··· ◊ an represents a unique element of the group irrespective of how we insert brackets to compute the element a1 ◊ ··· ◊ an.

Example 2.1.
  1. The set is an Abelian group under addition. The identity is 0 and the inverse of a is –a. Note, however, that is not a group under multiplication, because though it contains the multiplicative identity 1, multiplicative inverse is not defined for all elements in except ±1.

  2. The set of non-zero rational numbers is a group under multiplication. The identity is 1 = 1/1 and the inverse of a/b is b/a.

  3. For a set A, the set of all bijective functions AA is a group under composition of functions. The identity element is idA and the inverse of f is denoted by f–1. (See also Exercise 2.2.) This group is not Abelian in general.

  4. The set of all m × n matrices with entries from is a group under matrix addition. On the other hand, the set of all n × n invertible matrices over is a group under matrix multiplication and is called the general linear group. Note that is another example of a group that is not Abelian (for n > 1).

  5. A group G is called finite, if G as a set consists of (only) finitely many elements. Finite groups play an extremely important role in cryptography. Here is our first example of finite groups: Let n be an integer ≥ 2. The set

    is a group under addition modulo n (that is, add (and subtract) two elements in as integers and if the result is not in , take the remainder of division by n). For this group, the identity element is 0 and –a = na for a ≠ 0 and –0 = 0. (See Example 2.3 for a formal definition of .)

  6. For an integer n ≥ 2, define the set

    If n is prime, then . The set is a group under multiplication modulo n with identity 1. We need little more machinery than introduced so far in order to prove that every element has a multiplicative inverse modulo n. Other group axioms are easy to check.

Proposition 2.1.

Let (G, ◊) be a group and let a, b, . Then ab = ac implies b = c. Similarly, ac = bc implies a = b. These statements are commonly known as (left and right) cancellation laws.

Proof

We prove only the left cancellation law. The proof of the other law is similar. Let e denote the identity of G and d the inverse of a. Then b = eb = (da) ◊ b = d ◊ (ab) = d ◊ (ac) = (da) ◊ c = ec = c.

2.3.2. Subgroups, Cosets and Quotient Groups

Definition 2.3.

Let (G, ◊) be a group. Then a subset H of G is called a subgroup of G, if H is a group under the operation ◊ inherited from G. For a subset H of G to be a subgroup, it is necessary and sufficient that H is closed under the operation ◊ and under inverse. Any subgroup of an Abelian group is also Abelian.

Example 2.2.
  1. For any group G with identity element e, the subsets {e} and G are subgroups of G. They are called the trivial subgroups of G.

  2. For an integer n ≥ 2, the set of all integral multiples of n is an additive subgroup of and is denoted by .

  3. The set consisting of all n × n real matrices of determinant 1 is a subgroup of and is commonly referred to as the special linear group.

  4. Note that though in Example 2.1 is a subset of , it is not a subgroup of , since it is not closed under the addition of . It is a group under addition modulo n which is not the same as integer addition.

Let (G, ◊) be a group. For subsets A and B of G, we denote by AB the set . In particular, if A = {a} (resp. B = {b}), then AB is denoted by aB (resp. Ab). Note that the sets AB and BA are not necessarily equal. If G is Abelian, then AB = BA.

Definition 2.4.

Let (G, ◊) be a group, H a subgroup of G and . The set aH is called the left coset of a with respect to H and the set Ha is called the right coset of a with respect to H. If G is Abelian, then a left coset is naturally a right coset and vice versa. In that case, we call aH (or Ha) simply a coset.

From now onward, we consider left cosets only and call them cosets. If the underlying group is Abelian, then they are the same thing. The theory of right cosets can be parallelly developed, but we choose to omit that here. For simplicity, we also assume that the group G is a multiplicative group, so that the operation ◊ would be replaced by · (or by mere juxtaposition).

Proposition 2.2.

Let G be a (multiplicative) group and H a subgroup of G. Then, the cosets aH, , partition G. Two cosets aH and bH are equal if and only if . There is a bijective map from aH to bH for every a, .

Proof

We define a relation ~ on G such that a ~ b if and only if . Clearly, a ~ a. Now a ~ b implies , so that (See Exercise 2.8), that is, b ~ a. Finally, a ~ b and b ~ c imply a ~ c, since a–1c = (a–1b)(b–1c). Thus ~ is an equivalence relation on G and hence by Theorem 2.1 produces a partition of G. We now show that the equivalence class [a] of is the coset aH. This follows from that for some for some .

Now we define a map by ahbh for every . The map is clearly surjective. Injectivity of follows from the left cancellation law (Proposition 2.1). Hence is bijective.

The following theorem is an important corollary to the last proposition.

Theorem 2.2. Lagrange’s theorem

Let G be a finite group and H a subgroup of G. Then, the cardinality of G is an integral multiple of the cardinality of H.

Proof

From Proposition 2.2, the cosets form a partition of G and there is a bijective map from one coset to another. Hence by Exercise 2.3 all cosets have the same cardinality. Finally, note that H is the coset of the identity element.

Definition 2.5.

Let G be a group and H a subgroup of G. The number of distinct cosets of H in G is called the index of H in G and is denoted by [G : H]. If G is finite, then [G : H] = #G/#H.

Definition 2.6.

Let H be a subgroup of a (multiplicative) group G. Then H is called a normal subgroup of G, if (aH)(bH) = (abH) for all a, . It is clear that any subgroup H of an Abelian group G satisfies this condition and hence is normal.

If H is a normal subgroup of a group G, then the cosets aH, , form a group with multiplication defined by (aH)(bH) = (abH). This group is called the quotient group of G with respect to H and is denoted by G/H.

Example 2.3.
  1. Let n be an integer ≥ 2. The subgroup of (, +) (Example 2.2) is normal, since is Abelian. The coset of is the set . The quotient group is denoted as and is essentially the same as the group {0, 1, . . . , n – 1} with the operation of addition modulo n (Example 2.1).

  2. For any group G with identity e, the trivial subgroups G and {e} are normal. G/G is a group with a single element, whereas G/{e} is essentially the same as the group G.

2.3.3. Homomorphisms

Definition 2.7.

Let (G, ◊) and (G′, ⊙) be groups. A function f : GG′ is called a homomorphism (of groups), if f(ab) = f(a) ⊙ f(b) for all a, , that is, if f commutes with the group operations of G and G′.

A group homomorphism f : GG′ is called an isomorphism, if there exists a group homomorphism g : G′ → G such that g ο f = idG and f ο g = idG. It can be easily seen that a homomorphism f : GG′ is an isomorphism if and only if f is bijective as a function.[2] If there exists an isomorphism f : GG′, we say that the groups G and G′ are isomorphic and write GG′.

[2] If f : GG′ is a bijective homomorphism, its inverse f–1 : G′ → G is bijective as a function. However, it is not obvious that f–1 has to be a group homomorphism. We are lucky here; f–1 is.

A homomorphism f from G to itself is called an endomorphism (of G). An endomorphism which is also an isomorphism is called an automorphism. The set of all automorphisms of a group G is a group under function composition. We denote this group by Aut G.

Example 2.4.
  1. The canonical inclusion aa/1 is a group homomorphism from (, +) to (, +). More generally, if H is a subgroup of G, then the map hh for all is a group homomorphism. In particular, the identity map on any group G is an automorphism of G (and is the identity element of the group Aut G).

  2. For a (multiplicative) group G and a normal subgroup H, the map GG/H that takes to its coset aH is a surjective group homomorphism. It is called the canonical surjection of G onto G/H. For example, the map that takes a to its remainder of division by n (≥ 2) is a canonical surjection from the additive group to the quotient group . (Also see Examples 2.1, 2.2 and 2.3.)

  3. The map that takes a complex number z = a + ib to its conjugate is a group automorphism of both (, +) and (, ·).

Proposition 2.3.

Let f be a group homomorphism from (G, ◊) to (G′, ⊙). Let e and e′ denote the identity elements of G and G′ respectively. Then f(e) = e′. If a, and c, satisfy ab = e, cd = e′ and f(a) = c, then f(b) = d.

Proof

We have e′ ⊙ f(e) = f(e) = f(ee) = f(e) ⊙ f(e), so that by right cancellation f(e) = e′. To prove the second assertion we note that cd = e′ = f(e) = f(ab) = f(a) ⊙ f(b) = cf(b). Thus f(b) = d.

Definition 2.8.

With the notations of the last proposition we define the kernel of f to be the following subset of G:

Ker .

We also define the image of f to be the subset

Im

of G′. Then we have the following important theorem.

Theorem 2.3. Isomorphism theorem

Ker f is a normal subgroup of G, Im f is a subgroup of G′, and G/ Ker f ≅ Im f.

Proof

In order to simplify notations, let us assume that G and G′ are multiplicatively written groups. For u, , we have f(uv–1) = f(u)(f(v))–1 = e′, that is, . By Exercise 2.8, Ker f is a subgroup of H. We now show that it is normal. Note that for and we have f(aua–1) = f(a)f(u)f(a–1) = e′, that is, , since f(u) = e′ and f(a–1) = f(a)–1. By Exercise 2.10, Ker f is a normal subgroup of G. Now let a′ = f(a) and b′ = f(b) be arbitrary elements of Im f. Then, f(ab–1) = a′(b′)–1, that is, . Thus, by Exercise 2.8 Im f is a subgroup of G′.

Now define a map that takes a Ker ff(a). Let a Ker f = b Ker f. Then by Proposition 2.2, , that is, b = au for some . But then f(b) = f(au) = f(a)f(u) = f(a)e′ = f(a). This shows that the map is well-defined. It is easy to check that is a group homomorphism. Now implies f(a) = f(b), that is, f(a–1b) = e′, that is, , that is, a Ker f = b Ker f. Thus is injective. It is clearly surjective. Thus is bijective and hence an isomorphism from G/ Ker f to Im f.

2.3.4. Generators and Orders

Definition 2.9.

Let G be a group. In this section, we assume, unless otherwise stated, that G is multiplicatively written and has identity e. Let ai, , be a family of elements of G. Consider the subset H of G defined as

with the empty product (corresponding to r = 0) being treated as e. It is easy to check that H is a subgroup of G and contains all ai, . We call H to be the subgroup generated by ai, , or that the elements ai, , generate H. H is called finitely generated, if it is generated by finitely many elements. In particular, H is called cyclic, if it is generated by a single element. If H is cyclic and generated by , then g is called a generator or a primitive element of H. Note that, in general, a cyclic subgroup has more than one generators (Exercise 2.47).

Example 2.5.
  1. The additive groups and are generated by 1 and hence are cyclic. The multiplicative group is cyclic if and only if n is 2, 4, pr or 2pr, where p is an odd prime and (See Exercise 2.50). A generator of for such an n is often called a primitive root modulo n.

  2. The group (, ·) is generated by the “primes” p/1, , and –1.

  3. Let G be a multiplicative group (not necessarily Abelian) with identity e and let . Then the subgroup H generated by a is the set of elements of the form ar, , and is always Abelian. If H is finite, then the elements ar, , cannot be all distinct, that is, as = at for some s, , s > t. Then as–t = e, where st > 0. Now a–1 = as–t–1 and, more generally, ak = ak(st–1). Thus we may consider H to consist of non-negative powers of a only. Let . It is easy to see that H = {ar | r = 0, . . . , n – 1}.

Definition 2.10.

Let G be a finite group with identity e. The order of G is defined to be the cardinality of the set G and is denoted by ord G. The order of an element is the cardinality of the subgroup of G generated by a and is denoted by ordG a or simply by ord a, when G is understood from the context.

With these notations we prove the following important proposition.

Proposition 2.4.

The order m := ordG a of is the smallest of the positive integers r for which ar = e. If n = ord G, then n is an integral multiple of m. In particular, an = e.

Proof

Let H be the (cyclic) subgroup of G generated by a. Then by Example 2.5 H = {ar | r = 0, . . . , m – 1} and m is the smallest of the positive integers r for which ar = e. By Lagrange’s theorem (Theorem 2.2), n is an integral multiple of m. That is, n = km for some . But then an = (am)k = ek = e.

Lemma 2.1.

Let G be a finite cyclic group. Then any subgroup of G is also cyclic.

Proof

Let G be generated by g and ord G = n. Then G = {gr | r = 0, . . . , n – 1}. The subgroup {e} of G is clearly cyclic. For an arbitrary subgroup H ≠ {e} of G, define . Now take any and write r = qk + δ, where q and δ are respectively the quotient and remainder of division of r by k with 0 ≤ δ < k. Then gr = (gk)qgδ and so . The minimality of k implies that δ = 0, that is, gr = (gk)q.

Proposition 2.5.

Let G be a finite cyclic multiplicative group with identity e and let H be a subgroup of order m. Then an element is an element of H if and only if am = e.

Proof

If , then am = e by Proposition 2.4. Conversely, assume that am = e, but aH. Let K be the subgroup of G generated by the elements of H and by a. By Lemma 2.1, K is cyclic. By assumption, K contains more than m elements (since H ∪ {a} ⊆ K). But every element of K has order dividing m, a contradiction.

Finite cyclic groups play a crucial role in public-key cryptography. To see how, let G be a group which is finite, cyclic with generator g and multiplicatively written. Given one can compute gr using ≤ 2 lg r + 2 group multiplications (See Algorithms 3.9 and 3.10). This means that if it is easy to multiply elements of G, then it is also easy to compute gr. On the other hand, there are certain groups for which it is very difficult to find out the integer r from the knowledge of g and gr, even when one is certain that such an integer exists. This is the basic source of security in many cryptographic protocols, like those based on finite fields, elliptic and hyperelliptic curves.

*2.3.5. Sylow’s Theorem

Sylow’s theorem is a powerful tool for studying the structure of finite groups. Recall that if G is a finite group of order n and if H is a subgroup of G of order m, then by Lagrange’s theorem m divides n. But given any divisor m′ of n, there need not exist a subgroup of G of order m′. However, for certain special values of m′, we can prove the existence of subgroups of order m′. Sylow’s theorem considers the case that m′ is a power of a prime.

Definition 2.11.

Let G be a finite group of cardinality n and let p be a prime. If n = pr for some , we call G a p-group. More generally, let p be a prime divisor of n. Then a p-subgroup of G is a subgroup H of G such that H is a p-group. If H is a p-subgroup of G with cardinality pr for some , then pr divides n. Moreover, if pr+1 does not divide n, then H is called a p-Sylow subgroup of G.

We shortly prove that p-Sylow subgroups always exist. Before doing that, we prove a simpler result.

Theorem 2.4. Cauchy’s theorem

Let G be a finite group and p a prime dividing ord G. Then G has a subgroup of order p.

Proof

Let n := ord G. Note that if we can find an element such that ord a = p, then the subgroup generated by a is the desired subgroup. To do that consider the set consisting of all p-tuples (a1, . . . , ap) with such that a1 . . . ap = e. consists of np–1 elements, since we can choose a1, . . . , ap–1 arbitrarily and independently from G and for each such choice of a1, . . . , ap–1 the value of ap = (a1 . . . ap–1)–1 gets fixed. Since p divides n, it follows that p divides too. Now we define a relation ~ on by (a1, . . . , ap) ~ (b1, . . . , bp) if and only if (b1, . . . , bp) = (ai, . . . , ap, a1, . . . , ai–1) for some (that is, (b1, . . . , bp) is a cyclic shift of (a1, . . . , ap)). It is easy to see that ~ is an equivalence relation on . The equivalence class of (a1, . . . , ap) contains 1 or p elements depending on whether a1 = · · · = ap or not. Let r and s be the the number of equivalence classes containing 1 and p elements of respectively. Then , so that p divides r. Since the equivalence class of (e, . . . , e) contains only one element, we must have r ≥ 1, that is, rp. This, in turn, proves the existence of , ae, such that . But then ap = e.

Now we are in a position to prove the general theorem.

Theorem 2.5. Sylow’s theorem

Let G be a finite group of order n and let p be a prime dividing n. Then there exists a p-Sylow subgroup of G.

Proof

We proceed by induction on n. If n = p, then G itself is a p-Sylow subgroup of G. So we assume n > p and write n = prm, where p does not divide m. If r = 1, then the theorem follows from Cauchy’s theorem (Theorem 2.4). So we assume r > 1 and consider the class equation of G, namely, (See Exercise 2.16). If p does not divide [G : C(a)] for some aZ(G), then #C(a) = #G/[G : C(a)] = prm′ < #G for some m′ < m. By induction, C(a) has a p-Sylow subgroup which is also a p-Sylow subgroup of G. On the other hand, if p divides [G : C(a)] for all aZ(G), then p divides #Z(G), as can be easily seen from the class equation. We apply Cauchy’s theorem on Z(G) to obtain a subgroup H of Z(G) with #H = p. By Exercise 2.16(b), H is a normal subgroup of G and we consider the canonical surjection μ : GG/H. Since #(G/H) = pr–1m < n and r > 1, by induction G/H has a p-Sylow subgroup, say K. But then μ–1(K) is a p-Sylow subgroup of G.

Note that if H is a p-Sylow subgroup of G and , then gHg–1 is also a p-Sylow subgroup of G. The converse is also true, that is, if H and H′ are two p-Sylow subgroups of G, then there exists a such that H′ = gHg–1. We do not prove this assertion here, but mention the following important consequence of it. If G is Abelian, then H′ = gHg–1 = gg–1H = H, that is, there is only one p-Sylow subgroup of G. If G is Abelian and with pairwise distinct primes pi and with , then G is the internal direct product of its pi-Sylow subgroups, i = 1, . . . , t (Exercises 2.17 and 2.19).

Exercise Set 2.3

2.8Let G be a multiplicatively written group (not necessarily Abelian). Prove the following assertions.
  1. For all elements a, , we have (ab)–1 = b–1a–1 and (a–1)–1 = a.

  2. A subset H of G is a subgroup of G if and only if for all a, .

2.9Let G be a multiplicatively written group and let H and K be subgroups of G. Show that:
  1. HK is a subgroup of G.

  2. HK is a subgroup of G if and only if HK or KH.

  3. HK is a subgroup of G if and only if HK = KH. In particular, if K is normal in G, then HK is a subgroup of G.

  4. G × G is a group and H × K is a subgroup of G × G.

  5. If , then gHg–1 is a subgroup of G.

2.10
  1. Let G be a multiplicatively written group and H a subgroup of G. Show that the following conditions are equivalent:

    1. H is a normal subgroup of G.

    2. for all and .

    3. gHg–1 = H for all .

    4. gH = Hg for all .

  2. Show that if [G : H] = 2, then H is normal.

2.11Let G be a (multiplicative) group.
  1. Second isomorphism theorem Let H and K be subgroups of G and let K be normal in G. Show that H/(HK) ≅ (HK)/K. [H]

  2. Third isomorphism theorem Let H and K be normal subgroups of G with HK. Show that G/K ≅ (G/H)/(K/H) (where ). [H]

2.12
  1. Show that the only automorphisms of the group (, +) are the identity map and the map that sends a ↦ –a.

  2. Show that the group of automorphisms of (, +) is isomorphic to (, ·).

2.13Let H be a subgroup of G generated by ai, . Show that H is the smallest subgroup of G, that contains all of ai, .
2.14Let be a homomorphism of (multiplicative) groups. Show that:
  1. If H is a subgroup of G, then is a subgroup of G′. If is surjective and H is normal, then H′ is also normal.

  2. If H′ is a subgroup of G′, then is a subgroup of G. If H′ is normal, then H is also normal. If is surjective and H is normal, then H′ is also normal.

  3. Correspondence theorem Let H be a normal subgroup of G. Then the subgroups (resp. normal subgroups) of G/H are in one-to-one correspondence with the subgroups (resp. normal subgroups) of G, that contain H. [H]

2.15Let G be a cyclic group. Show that G is isomorphic to or to for some depending on whether G is infinite or finite.
2.16Let G be a finite (multiplicative) group (not necessarily Abelian).
  1. We define the centre of G to be the set . Show that Z(G) is a subgroup of G.

  2. If HZ(G) is a subgroup of G, show that H is a normal subgroup of G.

  3. The centralizer of is defined to be the set . Show that C(a) is a subgroup of G. Show also that C(a) = G if and only if .

  4. Define a relation ~ on G by a ~ b if and only if b = gag–1 for some . Show that ~ is an equivalence relation on G. We say that the elements a and b of G are conjugate, if the equivalence classes [a] and [b] are the same. The equivalence classes are called the conjugacy classes of G.

  5. Show that the cardinality of the conjugacy class of is equal to the index [G : C(a)].

  6. Deduce the class equation of G, that is, #G = #Z(G) + ∑[G : C(a)], where the sum is over a set of all pairwise non-conjugate aZ(G).

2.17Let G be a (multiplicative) Abelian group with identity e and order , where pi are distinct primes and . For each i, let Hi be the pi-Sylow subgroup of G. Show that:
  1. G = H1 · · · Hr. [H]

  2. Every element can be written uniquely as g = h1 · · · hr with . Moreover, in that case we have ordG g = (ordH1 h1) · · · (ordHr hr).

  3. G is cyclic if and only if all of H1, . . . , Hr are cyclic.

2.18Let G be a finite (multiplicative) Abelian group with identity e. Assume that for every there are at most n elements x of G satisfying xn = e. Show that G is cyclic. [H]
2.19Let G be a (multiplicative) group and let H1, . . . , Hr be normal subgroups of G. If G = H1 · · · Hr and every element can be written uniquely as g = h1 · · · hr with , then G is called the internal direct product of H1, . . . , Hr. (For example, if G is finite and Abelian, then by Exercise 2.17 it is the internal direct product of its Sylow subgroups.) Show that:
  1. If G is finite, it is the internal direct product of normal subgroups H1, . . . , Hr if and only if G = H1 · · · Hr and HiHj = {e} for all i, j, ij.

  2. If G is the internal direct product of the normal subgroups H1, . . . , Hr, then G is isomorphic to the (external) direct product H1 × · · · × Hr. [H]

2.20Let Hi, i = 1, . . . , r, be finite Abelian groups of orders mi and let H := H1 × · · ·× Hr be their direct product. Show that H is cyclic if and only if each Hi is cyclic and m1, . . . , mr are pairwise coprime.

2.4. Rings

So far we have studied algebraic structures with only one operation. Now we study rings which are sets with two (compatible) binary operations. Unlike groups, these two operations are usually denoted by + and · . One can, of course, go for general notations for these operations. However, that generalization doesn’t seem to pay much, but complicates matters. We stick to the conventions.

2.4.1. Definition and Basic Properties

Definition 2.12.

A ring (R, +, ·) (or R in short) is a set R together with two binary operations + and · on R such that the following conditions are satisfied. As in the case of multiplicative groups we write ab for a · b.

  1. Additive group The set R is an Abelian group under +. The additive identity is denoted by 0.

  2. · is associative (ab)c = a(bc) for every a, b, .

  3. · is commutative ab = ba for every a, .

  4. Multiplicative identity There is an element (denoted by 1) in R such that a · 1 = 1 · a = a for every . The element 1 is called the identity of R.

  5. Distributivity The operation · is distributive over +, that is, a(b+c) = ab + ac and (a + b)c = ac + bc for every a, b, .

Notice that it is more conventional to define a ring as an algebraic structure (R, +, ·) that satisfies conditions (1), (2) and (5) only. A ring (by the conventional definition) is called a commutative ring (resp. a ring with identity), if it (additionally) satisfies condition (3) (resp. (4)). As per our definition, a ring is always a commutative ring with identity. Rings that are not commutative or that do not contain the identity element are not used in the rest of the book. So let us be happy with our unconventional definition of a ring.[3]

[3] Cool! But what’s circular in a ring? Historically, such algebraic structures were introduced by Hilbert to designate a Zahlring (a number ring, see Section 2.13). If α is an algebraic integer (Definition 2.95) and we take a Zahlring of the form and consider the powers α, α2, α3, . . . , we eventually get an αd which can be expressed as a linear combination of the previous (that is, smaller) powers of α. This is perhaps the reason that prompted Hilbert to call such structures “rings”. Also see Footnote 1.

We do not rule out the possibility that 0 = 1 in R. In that case, for any , we have a = a · 1 = a · 0 = 0 (See Proposition 2.6), that is to say, the set R consists of the single element 0. In this case, R is called the zero ring and is denoted (by an abuse of notation) by 0.

Finally, note that R is, in general, not a group under multiplication. This is because we do not expect a ring R to contain the multiplicative inverse of every element of R. Indeed the multiplicative inverse of the element 0 exists if and only if R = 0.

Example 2.6.
  1. The sets , , and are all rings under usual addition and multiplication. Each of , and contains the multiplicative inverse of every non-zero element, whereas the only elements in , that have multiplicative inverses, are ±1.

  2. Let denote the set {0, 1, . . . , n – 1} for an integer n ≥ 2. Then is a ring under addition and multiplication modulo n. The additive identity is 0 and the multiplicative identity is 1. Later we see a more formal definition of this ring. Recall from Example 2.1 how we have defined the groups and under addition and multiplication modulo n. These groups have a connection with the ring as we will shortly see.

  3. Let R be a ring and S a set. The set of all functions SR is a ring under pointwise addition and multiplication of functions (that is, if f and g are two such functions, then we define (f + g)(a) := f(a) + g(a) and (f g)(a) := f(a)g(a) for every ). The additive (resp. multiplicative) identity in this ring is the constant function 0 (resp. 1).

  4. Let R be a ring. The set R[X] of all polynomials in one indeterminate X and with coefficients from R is a ring. The identity elements in R[X] are the constant polynomials 0 and 1. The addition and multiplication operations in R[X] are the standard ones on polynomials. For a non-zero polynomial , the largest non-negative integer d for which the coefficient of Xd is non-zero is called the degree of the polynomial f and is denoted by deg f. The coefficient of Xdeg f in f is called the leading coefficient of f and is denoted by lc(f). The degree of the zero polynomial is conventionally taken to be –∞. A non-zero polynomial with leading coefficient 1 is called a monic polynomial.

    More generally, for one can define the ring R[X1, . . . , Xn] of multivariate polynomials over R. Polynomial rings are of paramount importance in algebra and number theory. We devote Section 2.6 to a study of these rings.

    We also define the ring R(X) of rational functions over R, which consists of elements of the form f/g with f, , g ≠ 0. More generally, the set of elements f/g with f, , g ≠ 0, is a ring denoted R(X1, . . . , Xn).

  5. Let Ri, , be a family of rings, and the product of the sets Ri, , that is, the set of all ordered tuples indexed by I. For tuples and , define the sum and the product . It is easy to see that R is a ring with identity elements and . It is called the direct product of the rings Ri, . If I is of finite cardinality n and if Ri = A for all , then is denoted in short by An.

Proposition 2.6.

Let R be a ring. For all a, , we have:

  1. a · 0 = 0 · a = 0

  2. a(–b) = (–a)b = –ab

  3. (–a)(–b) = ab

Proof

  1. a · 0 = a · (0 + 0) = a · 0 + a · 0, so that a · 0 = 0. Similarly, 0 · a = 0.

  2. By (1), 0 = a · 0 = a(b + (–b)) = ab + a(–b), that is, a(–b) = –ab. Similarly, (–a)b = –ab.

  3. (–a)(–b) = –(a(–b)) = –(–ab) = ab.

Definition 2.13.

Let R be a ring.

  1. An element is called a zero-divisor of R, if ab = 0 for some , b ≠ 0. By this definition, 0 is a zero-divisor of R, unless R = 0. The elements 0, 3, 5, 6, 9, 10 and 12 are all the zero-divisors of .

  2. An element is called a unit of R, if there exists an element such that ab = 1. The elements 1 and –1 are units in any ring. It is easy to see that an element cannot be simultaneously a zero-divisor and a unit. The set of all units in a ring R is denoted by R* and is a group under the multiplication of the ring R (See Exercise 2.21), called the multiplicative group or the group of units of R. The multiplicative group of the ring (Example 2.6) is .

  3. An element is called nilpotent, if ak = 0 for some . By this definition, 0 is a nilpotent element in any ring. It is also evident that every nilpotent element in a non-zero ring is a zero-divisor. An example of a non-zero nilpotent element in a ring is .

  4. An element is called idempotent, if a2 = a. In every ring, 0 and 1 are idempotent. The element 6 is idempotent in . It is easy to check that 0 is the only element in a ring, that is both nilpotent and idempotent.

Definition 2.14.

Let R be a ring.

  1. R is called an integral domain (or simply a domain), if R ≠ 0 and if R contains no non-zero zero-divisors. Examples of integral domains: , , , , . On the other hand, 3 · 5 = 0 in , so Z15 is not an integral domain.

  2. R is called a field, if R ≠ 0 and if R* = R \ {0}, that is, if every non-zero element of R is a unit. This means that in a field one can divide any element by any non-zero element. The most common fields are , and . Note that is not a field, since, for example, 2 does not have a multiplicative inverse in .

  3. A field R with #R finite is called a finite field. The simplest examples of finite fields are the fields for prime integers p. In fact, it is easy to see that is a field if and only if n is a prime. Finite fields are widely applied for building various cryptographic protocols. See Section 2.9 for a detailed study of finite fields.

Corollary 2.1.

A field is an integral domain.

Proof

Recall from Definition 2.13 that an element in a ring cannot be simultaneously a unit and a zero-divisor.

Definition 2.15.

Let R be a non-zero ring. The characteristic of R, denoted char R, is the smallest positive integer n such that 1 + 1 + · · · + 1 (n times) = 0. If no such integer exists, then we take char R = 0.

, , and are rings of characteristic zero. If R is a non-zero finite ring, then the elements 1, 1 + 1, 1 + 1 + 1, · · · cannot be all distinct. This shows that there are positive integers m and n, m < n, such that 1+1+· · · + 1 (n times) = 1 + 1 + · · · + 1 (m times). But then 1 + 1 + · · · + 1 (nm times) = 0. Thus any non-zero finite ring has positive (that is, non-zero) characteristic. If char R = t is finite, then for any one has .

In what follows, we will often denote by n the element 1 + 1 + · · · + 1 (n times) of any ring. One should not confuse this with the integer n. One can similarly identify a negative integer –n with the ring element –(1 + 1 + · · · + 1)(n times) = (–1) + (–1) + · · · + (–1)(n times).

Proposition 2.7.

Let R be an integral domain of positive characteristic p. Then p is a prime.

Proof

If p is composite, then we can write p = mn with 1 < m < p and 1 < n < p. But then p = mn = 0 (in R). Since R is an integral domain, we must have m = 0 or n = 0 (in R). This contradicts the minimality of p.

2.4.2. Subrings, Ideals and Quotient Rings

Just as we studied subgroups of groups, it is now time to study subrings of rings. It, however, turns out that subrings are not that important for the study of rings as the subsets called ideals are. In fact, ideals (and not subrings) help us construct quotient rings. This does not mean that ideals are “normal” subrings! In fact, ideals are, in general, not subrings at all, and conversely. The formal definitions are waiting!

Definition 2.16.

Let R be a ring. A subset S of R is called a subring of R, if S is a ring under the ring operations of R. In this case, one calls R a superring or a ring extension of S.

If R and S are both fields, then S is often called a subfield of R and R a field extension (or simply an extension) of S. In that case, one also says that SR is a field extension or that R is an extension over S.

is a subring of , and , whereas and are field extensions.

We demand that a ring always contains the multiplicative identity (Definition 2.12). This implies that if S is a subring of R, then for all integers n, the elements are also in S (though they need not be pairwise distinct). Similarly, if R and S are fields, then S contains all the elements of the form mn–1 for m, , (cf. Exercise 2.26). Thus , the set of all even integers, is not a subring of , though it is a subgroup of (, +) (Example 2.2).

Definition 2.17.

Let R be a ring. A subset of R is called an ideal of R, if is an additive subgroup of (R, +) and if for all and .[4]

[4] Kummer introduced the concept of ideal numbers. Later Dedekind reformulated Kummer’s notion of ideal numbers to define what we now know as ideals.

In this book, we will use Gothic letters (usually lower case) like , , , , to denote ideals.[5]

[5] Mathematicians always run out of symbols. Many believe if it is Gothic, it is just ideal!

The condition for being an ideal is in one sense more stringent than that for being a subring, that is, an ideal has to be closed under multiplication by any element of the entire ring. On the other hand, we do not demand an ideal to necessarily contain the identity element 1. In fact, is an ideal of . Conversely, is a subring of but not an ideal. Subrings and ideals are different things.

Example 2.7.
  1. Let R be any ring. The subset {0} is an ideal of R, called the zero ideal and denoted also by 0. Similarly, the entire ring R is an ideal of R and is called the unit ideal. Note that if an ideal contains a unit u of R, then 1 = u–1u is also in and so for every . It follows that an ideal of R is the unit ideal if and only if contains a unit—a justification for the name.

  2. The integral multiples of an integer n form an ideal of denoted by . More generally, for any ring R and for any , the set is an ideal of R and is denoted by Ra or aR or 〈a〉. Such an ideal is called a principal ideal. (See also Definition 2.18.)

  3. Let R be a ring and let , , be a family of ideals of R. The intersection is an ideal of R. The set of finite sums the form (where and ) is an ideal of R. It is called the sum of the ideals , , and is denoted by . The union is, in general, not an ideal of R. In fact, the sum is the smallest ideal that contains (the set) .

Proposition 2.8.

The only ideals of a field are the zero ideal and the unit ideal.

Proof

By definition, every non-zero element of a field is a unit.

Definition 2.18.

Let R be a ring and ai, , a family of elements of R. The ideal generated by ai, , is defined to be the sum of the principal ideals Rai. We denote this as . In this case, we also say that is generated by ai, . If I is finite, then we say that is finitely generated. In particular, if #I = 1, then is a principal ideal (See Example 2.7).

An integral domain every ideal of which is principal is called a principal ideal domain or PID in short. A ring every ideal of which is finitely generated is called Noetherian. Thus principal ideal domains are Noetherian.

Note that an ideal may have different generating sets of varying cardinalities. For example, the unit ideal in any ring is principal, since it is generated by 1. The integers 2 and 3 generate the unit ideal of , since . However, neither 2 nor 3 individually generates the unit ideal of . Indeed, using Bézout’s relation (Proposition 2.16) one can show that for every there is a (minimal) generating set of the unit ideal of , that contains exactly n integers. Interested readers may try to construct such generating sets as an (easy) exercise.

Theorem 2.6.

is a principal ideal domain.

Proof

The zero ideal is generated by 0. Let be a non-zero ideal of and let a be the smallest positive integer contained in . We claim that . Clearly, . For the converse, take . We can write b = aq + r, where q and r are the quotient and the remainder of (Euclidean) division of b by a. Now and since 0 ≤ r < a, by the choice of a we must have r = 0, so that .

A very similar argument proves the following theorem. The details are left to the reader. Also see Exercise 2.31.

Theorem 2.7.

If K is a field, then K[X] is a principal ideal domain.

We now prove a very important theorem:

Theorem 2.8. Hilbert’s basis theorem

If R is a Noetherian ring, then so is the polynomial ring R[X1, . . . , Xn] for . In particular, the polynomial rings and K[X1, . . . , Xn] are Noetherian, where K is a field.

Proof

Using induction on n we can reduce to the case n = 1. So we prove that if R is Noetherian, then R[X] is also Noetherian. Let be a non-zero ideal of R[X]. Assume that is not finitely generated. Then we can inductively choose non-zero polynomials f1, f2, f3, · · · from such that for each the polynomial fi is one having the smallest degree in . Let di := deg fi. Then d1d2d3 ≤ · · ·. Let ai denote the leading coefficient of fi. Consider the ideal in R. By hypothesis, is finitely generated, say, . This, in particular, implies that for some . But then the polynomial belongs to , is non-zero and has degree < dr+1, a contradiction to the choice of fr+1. Thus must be finitely generated.

Two particular types of ideals are very important in algebra.

Definition 2.19.

Let R be a ring.

  1. An ideal of R is called a prime ideal, if and if implies or for a, . The second condition is equivalent to saying that if and , then the product . For a prime integer p, the principal ideal of is prime. On the other hand, for a composite integer n the ideal of is not prime. For example, and , but the product .

  2. An ideal of R is called a maximal ideal, if and if for any ideal satisfying we have or . This means that there are no non-unit ideals of R properly containing . All the ideals of for prime integers p are maximal ideals (Corollary 2.3). Next consider the polynomial ring and the principal ideal 〈X〉 of R. It is easy to see that 〈X〉  〈X, 2〉  R. Thus 〈X〉 is not maximal.

Prime and maximal ideals can be characterized by some nice equivalent criteria. See Proposition 2.9.

Definition 2.20.

Let R be a ring and an ideal of R. Then is a subgroup of the group (R, +). Since (R, +) is Abelian, is a normal subgroup (Definition 2.6). Thus the cosets , , form an additive Abelian group. We define multiplication on these cosets as . It is easy to check that this multiplication is well-defined. Furthermore, the set of these cosets, denoted , becomes a ring under this addition and multiplication. The ring is called the quotient ring of R with respect to .

We say that two elements a, are congruent modulo an ideal (of R) and write ab (mod ), if . Thus ab (mod ) if and only if a and b lie in the same coset of , that is, .

Example 2.8.
  1. For any ring R, the quotient ring R/0 is essentially the same as R and the quotient ring R/R is the zero ring.

  2. The ring of Example 2.6 is formally defined to be the quotient ring . Convince yourself that both these definitions are equivalent.

Proposition 2.9.

Let R be a ring and an ideal of R.

  1. is a prime ideal of R if and only if is an integral domain.

  2. is a maximal ideal of R if and only if is a field.

Proof

  1. Let a, be arbitrary. Then is prime ⇔ implies or implies or is an integral domain.

  2. Let be a maximal ideal. Choose . Then . Consider the ideal . Since is maximal, we must have . This means that a + cb = 1 for some and . Then which implies that is a unit in . That is, is a field.

    Conversely, let be a field. Consider any ideal of R with . Choose any . Then . By hypothesis, there exists such that , that is, . Hence , that is, .

The last proposition in conjunction with Corollary 2.1 indicates:

Corollary 2.2.

Maximal ideals are prime.

Corollary 2.3.

For every , the quotient ring is a field. In particular, is a maximal ideal of .

Proof

Since is a prime ideal of , is an integral domain. But is finite, so by Exercise 2.25 is a field.

2.4.3. Homomorphisms

Recall how we have defined homomorphisms of groups. In a similar manner, we define homomorphisms of rings. A ring homomorphism is a map from one ring to another, which respects addition, multiplication and the identity element. More precisely:

Definition 2.21.

Let R and S be rings. A map f : RS is called a (ring)homomorphism, if f(a+b) = f(a) + f(b) and f(ab) = f(a)f(b) for all a, and if f(1) = 1. A homomorphism f : RS is called an isomorphism, if there exists a homomorphism g : SR such that g ο f = idR and f ο g = idS. As in the case of groups, bijectivity of f as a function is both necessary and sufficient for a homomorphism f : RS to be an isomorphism. If f : RS is an isomorphism, we write RS and say that R is isomorphic to S or that R and S are isomorphic.

A homomorphism f : RR is called an endomorphism of R. An automorphism is a bijective endomorphism.

Example 2.9.
  1. For any ring extension RS, the canonical inclusion aa is a homomorphism from RS. In particular, the identity map on any ring is an automorphism.

  2. Let R be a ring and an ideal of R. The canonical surjection that takes is a ring homomorphism.

  3. Let R be a ring and let . The map R[X] → R that takes f(X) ↦ f(a) is a ring homomorphism and is called the substitution homomorphism.

  4. The map taking n ↦ –n is not a ring homomorphism, since it maps 1 to –1 (and does not satisfy f(ab) = f(a)f(b) for all a, ).

  5. The map that maps z = a + ib to its conjugate is an automorphism of the field .

Proposition 2.10.

Let f : RS be a ring homomorphism.

  1. If is a unit, then f(a) is a unit in S and f(a–1) = (f(a))–1.

  2. Let be an ideal in S. Then is an ideal in R. If is prime, then is also prime.

Proof

  1. If ab = 1, then f(a)f(b) = f(ab) = f(1) = 1.

  2. For , a, and b, with f(a) = b and f(a′) = b′, we have and . Thus is an ideal of R. If , then . If is prime (in which case and are proper ideals of R and S respectively), then or . But then or .

The ideal of the above proposition is called the contraction of and is often denoted by . If RS and f is the inclusion homomorphism, then .

Definition 2.22.

Let f : RS be a ring homomorphism. The set is called the kernel of f and is denoted by Ker f. The set is called the image of f and is denoted by f(R) or Im f.

Theorem 2.9. Isomorphism theorem

With the notations of the last definition, Ker f is an ideal of R, Im f is a subring of S and R/ Ker f ≅ Im f.

Proof

Consider the map that takes a + Ker ff(a). It is easy to verify that is a well-defined ring homomorphism and is bijective. The details are left to the reader. Also see Theorem 2.3.

Definition 2.23.

Two ideals and of a ring R are called relatively prime or coprime if , that is, if there exist and with a + b = 1.

Theorem 2.10. Chinese remainder theorem (CRT)

Let R be a ring and . Let be ideals in R such that for all i, j, ij, the ideals and are relatively prime. Then is isomorphic to the direct product .

Proof

The assertion is obvious for n = 1. So assume that n ≥ 2 and define the map by for all . Since for all i, the map is well-defined. It is easy to see that is a ring homomorphism. In order to show that is injective, we let . This means that , that is, for all i. Then , that is, . The trickier part is to prove that is surjective. Let . Let us consider the ideal for each i. For a given i, there exist for each ji elements and with αj + βj = 1. Multiplying these equations shows that we have a such that γi + δi = 1, where . (This shows that for all i.) Now consider the element . It follows that for all i, that is, .

In Section 2.5, we will see an interesting application of this theorem. Notice that the injectivity of in the last proof does not require the coprimality of ; the surjectivity of requires this condition.

2.4.4. Factorization in Rings

Now we introduce the concept of divisibility in a ring. We also discuss about an important type of rings known as unique factorization domains. This study is a natural generalization of that of the rings and K[X], K a field.

Definition 2.24.

Let R be a ring, a, and . Also let K be a field.

  1. We say that a divides b and write a|b, if there exists an element such that b = ac. If a does not divide b, we write ab. In , for example, –31|899, since 899 = (–31) · (–29). By this definition, any element divides 0, whereas 0 divides no element other than 0.

  2. It is easy to see that a|b and b|a if and only if b = ca for some unit . In that case, we say that a and b are associates of each other. The relation of being associate is an equivalence relation on R (or R \ {0}), as can be easily verified. The only associates of , a ≠ 0, are ±a, since ±1 are the only units in . Two non-zero polynomials f and g of K[X] are associates if and only if f = αg for some .

  3. A non-zero non-unit is called a prime, if p|ab implies either p|a or p|b. One can check easily that p is prime if and only if the principal ideal 〈p〉 = pR is a prime ideal.

  4. A non-zero non-unit is called irreducible, if p = ab implies either a or b is a unit.

Note that for the concepts of prime and irreducible elements are the same. This is indeed true for any PID (Proposition 2.12). Thus our conventional definition of a prime integer p > 0 as one which has only 1 and p as (positive) divisors tallies with the definition of irreducible elements above. For the ring K[X], on the other hand, it is more customary to talk about irreducible polynomials instead of prime polynomials; they are the same thing anyway.

Proposition 2.11.

Let R be an integral domain and a prime. Then p is irreducible.

Proof

Let p = ab. Then p|(ab), so that by hypothesis p|a or p|b. If p|a, then a = up for some . Hence p = ab = upb, that is, (1 – ub)p = 0. Since R is an integral domain and p ≠ 0, we have 1 – ub = 0, that is, ub = 1, that is, b is a unit. Similarly, p|b implies a is a unit.

Proposition 2.12.

Let R be a PID. An element is prime if and only if p is irreducible.

Proof

[if] Let p be irreducible, but not prime. Then there are a, such that a ∉ 〈p〉 and b ∉ 〈p〉, but . Consider the ideal . Since , we have p = cα for some . By hypothesis, p is irreducible, so that either c or α is a unit. If c is a unit, 〈p〉 = 〈α〉 = 〈p〉 + 〈a〉, that is, , a contradiction. So α is a unit. Then 〈p〉 + 〈a〉 = R which implies that there are elements u, such that up + va = 1. Similarly, there are elements u′, such that up + vb = 1. Multiplying these two equations gives (uup + uvb + uva)p + (vv′)ab = 1. Now , so that ab = wp for some . But then (uup + uvb + uva + vvw)p = 1, which shows that p is a unit, a contradiction.

[only if] Immediate from Proposition 2.11.

Definition 2.25.

An integral domain R is called a unique factorization domain or a UFD in short, if every non-zero element can be written as a product a = up1 · · · pr, where , and p1, . . . , pr are prime elements (not necessarily distinct) of R. Moreover, such a factorization is unique up to permutation of the primes p1, . . . , pr and up to multiplication of the primes by units. This factorization can also be written as , where , , q1, . . . , qs are pairwise non-associate primes and αi > 0 for i = 1, . . . , s. Some authors also use the term factorial ring or factorial domain in order to describe a UFD.

If is a prime and , a ≠ 0, then the multiplicity of p in a is the nonnegative integer v such that pv|a, but pv+1a. This integer v is denoted by vp(a). It is clear form the definition that for every , a ≠ 0, there exist only finitely many non-associate primes p for which vp(a) > 0.

Proposition 2.13.

Let R be a UFD. An element is prime if and only if p is irreducible.

Proof

The only if part is immediate from Proposition 2.11. For proving the if part, let p = up1 · · · pr ( and pi primes in R) be irreducible. If r = 0, p is a unit, a contradiction. If r > 1, then p can be written as the product of two non-units up1 · · · pr–1 and pr, again a contradiction. So r = 1.

A classical example of an integral domain that is not a UFD is . In this ring, we have two essentially different factorizations of 6 into irreducible elements. The failure of irreducible elements to be primes in such rings is a serious thing to patch up!

Theorem 2.11.

A PID is a UFD

Proof

Let R be a PID and . We show that a has a factorization of the form a = up1 · · · pr, where u is a unit and p1, . . . , pr are prime elements of R. If a is a unit, we are done. So assume that a =: a0 is a non-unit and let . Since , there is a maximal ideal containing (Exercise 2.23). Then p1 is a prime that divides a0. Let a0 = a1p1. We have . If is the unit ideal, we are done. Otherwise we choose as before a prime p2 dividing a1 and with a1 = a2p2 get the ideal properly containing . Repeating this process we can generate a strictly ascending chain of ideals of R. Since R is a PID and hence Noetherian, this process must stop after finitely many steps (Exercise 2.33).

The converse of the above theorem is not necessarily true. For example, the polynomial ring K[X1, . . . , Xn] over a field K is a UFD for every , but not a PID for n ≥ 2.

Divisibility in a UFD can be rephrased in terms of prime factorizations. Let R be a UFD and let the non-zero elements a, have the prime factorizations and with units u, u′, pairwise non-associate primes p1, . . . , pr and with αi ≥ 0 and βi ≥ 0. Then a|b if and only if αi ≤ βi for all i = 1, . . . , r. This notion leads to the following definitions.

Definition 2.26.

Let R be a UFD and let a, have prime factorizations as in the last paragraph. Any associate of , is called a greatest common divisor of a and b and is denoted by gcd(a, b). Clearly, gcd(a, b) is unique up to multiplication by units of R. Similarly, any associate of , is called a least common multiple of a and b and is denoted by lcm(a, b). lcm(a, b) is again unique up to multiplication by units of R. The gcd of a ≠ 0 and 0 is taken to be an associate of a, whereas gcd(0, 0) is undefined. On the other hand, lcm(a, 0) is defined to be 0 for any .

It is clear that these definitions of gcd and lcm can be readily generalized for any arbitrary finite number of elements.

Corollary 2.4.

Let R be a UFD and a, not both zero. Then gcd(a, b) · lcm(a, b) is an associate of ab.

Proof

Immediate from the definitions.

Corollary 2.5.

Let R be a UFD and a, b, with a|bc. If gcd(a, c) = 1, then a|b.

Proof

Consider the prime factorizations of a, b and c.

For a PID, the gcd and lcm have equivalent characterizations.

Proposition 2.14.

Let R be a PID and a, b be non-zero elements of R. Let d be a gcd of a and b. Then 〈d〉 = 〈a〉 + 〈b〉. If f is an lcm of a and b, then 〈f〉 = 〈a〉 ∩ 〈b〉.

Proof

Let 〈a〉 + 〈b〉 = 〈c〉. We show that c and d are associates. There exist u, such that ua + vb = c. Since d|a and d|b, we have d|c. On the other hand, , so that c|a. Similarly c|b. Considering the prime factorizations of a and b one can then readily verify that c|d. The proof for the second part is similar and is left to the reader.

A direct corollary to the last proposition is the following.

Corollary 2.6.

Let R be a PID, a, (not both zero) and d a gcd of a and b. Then there are elements u, such that ua + vb = d. In particular, the ideals 〈a〉 and 〈b〉 are relatively prime if and only if gcd(a, b) is a unit. In that case, we also say that the elements a and b are relatively prime or coprime.

This completes our short survey of factorization in rings. Note that and K[X] (for a field K) are PID and hence UFD. Thus all the results we have proved in this section apply equally well to both these rings. It is because of this (and not of a mere coincidence) that these two rings enjoy many common properties. Thus our abstract treatment saves us from the duplicate effort of proving the same results once for integers (Section 2.5) and once more for polynomials (Section 2.6).

Exercise Set 2.4

2.21For a non-zero ring R, prove the following assertions:
  1. A unit of R is not a zero-divisor.

  2. The product of two units of R is again a unit.

  3. The product of two non-units of R is again a non-unit.

  4. The element 0 is not a unit in R.

  5. The element 1 is always a unit in R.

  6. If a is a unit and ab = ac, then b = c.

Let K be a field. What are the units in the polynomial ring K[X]? In K[X1, . . . , Xn]? In the ring K(X) of rational functions? In K(X1, . . . , Xn)?

2.22

Binomial theorem Let R be a ring, a, and . Show that

where

are the binomial coefficients.

2.23Show that every non-zero ring has a maximal (and hence prime) ideal. More generally, show that every non-unit ideal of a non-zero ring is contained in a maximal ideal. [H]
2.24Let R be a ring.
  1. Show that the set of all nilpotent elements of R is an ideal of R. This ideal is usually denoted by and is called the nilradical of R.

  2. Show that the quotient ring has no non-zero nilpotent elements. (The ring is called the reduction of R and is often written as Rred. If , then we say that R is reduced. Thus is always reduced.)

  3. Show that the nilradical of R is the intersection of the prime ideals of R. [H]

2.25Show that a finite integral domain R is a field. [H]
2.26Let R be a ring of characteristic 0. Show that:
  1. R contains infinitely many elements.

  2. If R is an integral domain, then R contains as subring an isomorphic copy of .

  3. If R is a field, then R contains as subfield an isomorphic copy of .

2.27Let f : RS be a ring-homomorphism and let and be ideals in R and S respectively. Find examples to corroborate the following statements.
  1. Let be such that f(a) is a unit in S. Then a need not be a unit in R.

  2. The set need not be an ideal of S.

  3. If and if is maximal, then need not be maximal.

2.28Let K be a field.
  1. Show that a homomorphism from K to any non-zero ring is injective.

  2. Let L be another field and let f : KL and g : LK be homomorphisms such that g ο f = idK. Show that f and g are isomorphisms.

2.29
  1. Show that a ring R is an integral domain if and only if 0 is a prime ideal of R.

  2. Give an example of a reduced ring that is not an integral domain. (Note that an integral domain is always reduced.)

2.30Let R be a ring and let and be ideals of R with . Show that is an ideal of and that . [H]
2.31An integral domain R is called a Euclidean domain (ED) if there is a map satisfying the following two conditions:
  1. ν(a) ≤ ν(ab) for all a, .

  2. For every a, with b ≠ 0, there exist (not necessarily unique) q, such that a = qb + r with r = 0 or ν(r) < ν(b).

Show that:

  1. is a Euclidean domain with ν(a) = |a| for a ≠ 0.

  2. The polynomial ring K[X] over a field K is a Euclidean domain with ν(a) = deg a for a ≠ 0.

  3. For d = –2, –1, 2, 3, the ring

    is a Euclidean domain with , a, , not both 0.

  4. A Euclidean domain is a PID (and hence a UFD).

2.32Let R be a ring and an ideal. Consider the set

Show that is an ideal of R. It is called the radical or root of . If , then is called a radical or a root ideal. For arbitrary ideals and of R, prove the following assertions.

  1. .

  2. .

  3. If , then .

  4. If is a prime ideal, then .

  5. if and only if .

  6. .

  7. .

  8. The nilradical .

2.33Let R be a ring. An ascending chain of ideals is a sequence . The ascending chain is called stationary, if there is some such that for all nn0. Show that the following conditions are equivalent. [H]
  1. R is Noetherian (that is, every ideal of R is finitely generated).

  2. Every ascending chain of ideals in R is stationary.

  3. Every non-empty set of ideals of R has a maximal element.

2.34
  1. Let R be an integral domain. Define the set . Define a relation ~ on S as (a, b) ~ (c, d) if and only if ad = bc. Show that ~ is an equivalence relation on S. Let us denote the equivalence class of by a/b and the set of all equivalence classes of S under ~ by K.

  2. Now define (a/b)+(c/d) := (ad+bc)/(bd) and (a/b)·(c/d) := (ac)/(bd). Show that these definitions make K a field. This field is called the quotient field of R and is denoted as Q(R). This process resembles the formation of rational numbers from the integers. Indeed, .

2.5. Integers

The set of integers is the main object of study in this section. We use many results from previous sections to derive properties of integers. Recall that is a PID and hence a UFD.

2.5.1. Divisibility

The notions of divisibility, prime and relatively prime integers, gcd and lcm of integers are essentially the same as discussed in connection with a PID or a UFD. We avoid repeating the definitions here, but concentrate on other useful properties of integers, not covered so far. We only mention that whenever we talk about a prime integer, or the gcd or lcm of two or more integers, we will usually refer to a non-negative integer. This convention makes primes, gcds and lcms unique.

Theorem 2.12.

There are infinitely many prime integers.

Proof

Let be arbitrary and let p1, p2, . . . , pn be n distinct primes. The (non-zero non-unit) integer q := p1p2 · · · pn + 1 is divisible by neither of p1, . . . , pn and hence must have a prime divisor pn+1 different from p1, . . . , pn. The result then follows by induction on n (and the fact that the set of primes is non-empty).

Theorem 2.13.

For an integer a and an integer b ≠ 0, there exist unique integers q and r such that a = qb + r with 0 ≤ r < |b|.

Proof

Call the smallest non-negative element in the set to be r and the corresponding value of c to be q. Then these integers q and r satisfy the desired properties. To prove the uniqueness let a = q1b + r1 = q2b + r2, where 0 ≤ r1 < |b| and 0 ≤ r2 < |b|. But then (q2q1)b = r1r2 with –|b| < r1r2 < |b|. Since b|(r1r2), we must then have r1r2 = 0, that is, r1 = r2, which, in turn, implies that q1 = q2.

The integers q and r in the above theorem are respectively called the quotient and the remainder of Euclidean division of a by b and are denoted respectively by a quot b and a rem b. Do not confuse Euclidean division with the division (that is, the inverse of multiplication) of the ring . Euclidean division is the basis of the Euclidean gcd algorithm. More specifically:

Proposition 2.15.

For integers a, b with b ≠ 0, let r be the remainder of Euclidean division of a by b. Then gcd(a, b) = gcd(b, r).

Proof

Clearly, 〈a〉 + 〈b〉 = 〈r〉 + 〈b〉. Now use Proposition 2.14.

Proposition 2.16.

Let a and b be two integers, not both zero, and let d be the (positive) gcd of a and b. Then there are integers u and v such that d = ua + vb. (Such an equality is called a Bézout relation.) Furthermore, if a and b are both non-zero and (|a|, |b|) ≠ (1, 1), then u and v can be so chosen that |u| < |b| and |v| < |a|.

Proof

The existence of u and v follows immediately from Proposition 2.14. If a = qb, then u = 0 and v = 1 is a suitable choice. So assume that ab and ba, in which case d < |a| and d < |b|. We may assume, without loss of generality, that a and b are positive. First note that if (u, v) satisfies the Bézout relation, then for any the pair (u + kb, vka) also satisfies the same relation. So we may replace v by its remainder of Euclidean division by a and may assume |v| < a. But then |ua| – b < |ua| – d ≤ |uad| = |vb| ≤ (a – 1)b, which implies |u| < b.

The notions of the gcd and of the Bézout relation can be generalized to any finite number of integers a1, . . . , an as

gcd(a1, . . . , an) = gcd(· · · (gcd(gcd(a1, a2), a3) · · ·), an) = u1a1 + · · · + unan

for some integers u1, . . . , un (provided that all the gcds mentioned are defined).

2.5.2. Congruences

Since is a PID, congruence modulo a non-zero ideal of can be rephrased in terms of congruence modulo a positive integer as follows.

Definition 2.27.

Let . Two integers a and b are said to be congruent modulo n, denoted ab (mod n), if n|(ab), that is, if the remainders of Euclidean divisions of a and b by n are the same. In terms of ideals, this is the same as ab (mod 〈n〉) (See Definition 2.20). Congruence is an equivalence relation on , the equivalence classes being the cosets of the ideal of .

By an abuse of notation, we often denote the equivalence class [a] of simply by a. The following are some basic properties of congruent integers.

Proposition 2.17.

Let , ab (mod n) and cd (mod n). Then:

  1. a ± cb ± d (mod n).

  2. acbd (mod n).

  3. For any polynomial , we have f(a) ≡ f(b) (mod n).

  4. If n′|n, then ab (mod n′).

  5. If m|a and m|b, then a/mb/m (mod n/ gcd(n, m)).

Proof

(1) and (2) follow from the consideration of the quotient ring . (3) follows from repeated applications of (1) and (2). For the proof of (4), consider ab = kn and n = kn′ for k, . For proving (5), take ab = kn = lm. Then m/ gcd(n, m) divides k(n/ gcd(n, m)). Since m/ gcd(n, m) and n/ gcd(n, m) are coprime, by Corollary 2.5 l′ := k/(m/ gcd(n, m)) is an integer and we have a/mb/m = l = kn/m = l′(n/ gcd(n, m)).

Let with gcd(ni, nj) = 1 for ij. Then lcm(n1, . . . , nr) = n1 · · · nr, and by the Chinese remainder theorem (Theorem 2.10), we have

This implies that, given integers a1, . . . , ar, there exists an integer x unique modulo n1 · · · nr such that x satisfies the following congruences simultaneously:

xa1 (mod n1)
xa2 (mod n2)
  
xar (mod nr)

We now give a procedure for constructing the integer x explicitly. Define N := n1 · · · nr and Ni := N/ni for 1 ≤ ir. Then for each i we have gcd(ni, Ni) = 1 and, therefore, there are integers ui and vi with uini + viNi = 1. Then (mod N) is the desired solution.

Let . We now study the multiplicative group of the ring . We say that an integer has a multiplicative inverse modulo n, if , or, equivalently, if there is an integer b with ab ≡ 1 (mod n). The following proposition is an important characterization of the elements of .

Proposition 2.18.

(The equivalence class of) an integer a belongs to if and only if gcd(a, n) = 1.

Proof

[if] By Proposition 2.16, there exist integers u and v such that ua + vn = 1. But then ua ≡ 1 (mod n).

[only if] For some integers u and v, we have ua + vn = 1, which implies that the gcd of a and n divides 1 and hence is equal to 1.

Definition 2.28.

The cardinality of is denoted by φ(n). By Proposition 2.18, φ(n) is equal to the number of integers between 0 and n – 1 (both inclusive), which are relatively prime to n. The function is called Euler’s totient function. For example, for a prime p we have , so φ(p) = p – 1.

The following two theorems are immediate consequences of Proposition 2.4.

Theorem 2.14. Euler’s theorem

Let and with gcd(a, n) = 1. Then

aφ(n) ≡ 1 (mod n).

Theorem 2.15. Fermat’s little theorem

Let p be a prime and with gcd(a, p) = 1. Then

ap–1 ≡ 1 (mod p).

For any integer , one has bpb (mod p).

Theorem 2.16. Wilson’s theorem

For every prime p, we have (p – 1)! ≡ –1 (mod p).

Proof

The result holds for p = 2. So assume that p is an odd prime. Since is a field, Fermat’s little theorem gives the factorization

Equation 2.1


Looking at the constant terms in two sides proves Wilson’s theorem.

The structure of the group , , can be easily deduced from Fermat’s little theorem. This gives us the following important result.

Proposition 2.19.

For a prime p, the group is cyclic.

Proof

For every divisor d of p –1, we have Xp–1–1 = (Xd–1)f(X) for some with deg f = p – 1 – d. By Congruence 2.1, Xp–1 – 1 has p – 1 roots modulo p. Since is a field, f(X) (mod p) cannot have more than p – 1 – d roots (Proposition 2.25) and it follows that Xd–1 has exactly d roots modulo p. In particular, if d = qe for some and , then there exist exactly qe elements of of orders dividing qe and exactly qe–1 elements of of orders dividing qe–1, that is, there are qeqe–1 > 0 elements of of order qe. If is the canonical prime factorization of p – 1 (with each ei ≥ 1), by the above argument there exists an element of order for each i = 1, . . . , r. It is now easy to check that has order .

Euler’s totient function plays an extremely important role in number theory (and cryptology). We now describe a method for computing it.

Lemma 2.2.

If n and n′ are relatively prime positive integers, then φ(nn′) = φ(n)φ(n′).

Proof

If a is invertible modulo nn′, then clearly it is invertible modulo both n and n′. Conversely, if ua ≡ 1 (mod n) and ua′ ≡ 1 (mod n′), then by the Chinese remainder theorem there are integers x and α, unique modulo nn′, satisfying xu (mod n), xu′ (mod n′), α ≡ a (mod n) and α ≡ a′ (mod n′). But then xα ≡ 1 (mod nn′). Therefore, , whence the lemma follows.

Lemma 2.3.

If p is a prime and , then φ(pe) = pepe–1 = pe(1 – 1/p).

Proof

Integers between 0 and pe – 1, which are relatively prime to pe are precisely those that are not multiples of p.

Proposition 2.20.

Let be the prime factorization of a positive integer n with , with pairwise distinct primes p1, . . . , pr and with ei > 0. Then

Proof

Immediate from Lemmas 2.2 and 2.3.

By Proposition 2.18, the linear congruence ax ≡ 1 (mod n) is solvable for x if and only if gcd(a, n) = 1. In such a case, the solution is unique modulo n. Now, let us concentrate on the solutions of the general linear congruence:

axb (mod n).

Theorem 2.17 characterizes the solutions of this congruence.

Theorem 2.17.

Let d := gcd(a, n). Then the congruence axb (mod n) is solvable for x if and only if d|b. A solution of the congruence, if existent, is unique modulo n/d.

Proof

[if] By Proposition 2.17, (a/d)xb/d (mod n/d). Since gcd(a/d, n/d) = 1, the congruence (a/d)x′ ≡ 1 (mod n/d) is solvable for x′. Then a solution for x is x ≡ (b/d)x′ (mod n/d).

[only if] There exists an integer k such that ax + kn = b. This shows that d|b.

To prove the uniqueness let x and x′ be two integers satisfying the given congruence. But then a(xx′) ≡ 0 (mod n), that is, (a/d)(xx′) ≡ 0 (mod n/d), that is, xx′ ≡ 0 (mod n/d), since gcd(a/d, n/d) = 1.

The last theorem implies that if d|b, then the congruence axb (mod n) has d solutions modulo n. These solutions are given by ξ + r(n/d), r = 0, . . . , d – 1, where ξ is the solution modulo n/d of the congruence (a/d)ξ ≡ b/d (mod n/d).

2.5.3. Quadratic Residues

In this section, we consider quadratic congruences, that is, congruences of the form ax2+bx+c ≡ 0 (mod n). We start with the simple case . We assume further that p is odd, so that 2 has a multiplicative inverse mod p. Since we are considering quadratic equations, we are interested only in those integers a for which gcd(a, p) = 1. In that case, a also has a multiplicative inverse mod p and the above congruence can be written as y2 ≡ α (mod p), where yx + b(2a)–1 (mod p) and α ≡ b2(4a2)–1c(a–1) (mod p). This motivates us to provide Definition 2.29.

Definition 2.29.

Let p be an odd prime and a an integer with gcd(a, p) = 1. We say that a is a quadratic residue modulo p, if the congruence x2a (mod p) has a solution (for x). Otherwise we say that a is a quadratic non-residue modulo p.

If a is a quadratic residue modulo an odd prime p, then the equation x2a (mod p) has exactly two solutions. If ξ is one solution, the other solution is p – ξ. It is, therefore, evident that there are exactly (p – 1)/2 quadratic residues and exactly (p – 1)/2 quadratic non-residues modulo p. For example, the quadratic residues modulo p = 11 are 1 = 12 = 102, 3 = 52 = 62, 4 = 22 = 92, 5 = 42 = 72 and 9 = 32 = 82. The quadratic non-residues modulo 11 are, therefore, 2, 6, 7, 8 and 10. We treat 0 neither as a quadratic residue nor as a quadratic non-residue.

Definition 2.30.

Let p be an odd prime and a an integer with gcd(a, p) = 1. The Legendre symbol is defined as:

Proposition 2.21.

Let p be an odd prime and a and b integers coprime to p.

  1. Euler’s criterion .

  2. .

  3. , and .

  4. If ab (mod p), then . In particular, if r is the remainder of Euclidean division of a by p, then .

Proof

If a is a quadratic residue modulo p, then ab2 (mod p) for some integer b (coprime to p) and by Fermat’s little theorem we have a(p–1)/2bp–1 ≡ 1 (mod p). Conversely, the polynomial Xp–1 – 1 = (X(p–1)/2 – 1)(X(p–1)/2 + 1) has p – 1 (distinct) roots mod p (again by Fermat’s little theorem). We have seen that no quadratic residues are roots of X(p–1)/2 + 1. Since is a field, the (p – 1)/2 roots of X(p–1)/2 – 1 are precisely all the quadratic residues modulo p. This proves Euler’s criterion. The other statements are immediate consequences of this.

Euler’s criterion gives us a nice way to check if a given integer is a quadratic residue modulo an odd prime. While this is much faster than the brute-force strategy of enumerating all the quadratic residues, it is still not the best solution, because it involves a modular exponentiation. We can, however, employ a gcd-like procedure for a faster computation. The development of this method demands further results which are otherwise interesting in themselves as well. The first important result is known as the law of quadratic reciprocity (Theorem 2.18 below). Gauss was the first to prove it and he deemed the result so important that he gave eight proofs for it. At present about two hundred published proofs of this law exist in the literature. We go in the classical way, that is, the Gaussian way, because the proof, though somewhat long, is elementary.

Lemma 2.4. Gauss

Let p be an odd prime and a an integer with gcd(a, p) = 1. Let us denote t := (p – 1)/2. For an integer i, let ri be the unique integer with riia (mod p) and –trit. Let n be the number of i, 1 ≤ it, for which ri is negative. Then .

Proof

It is easy to check that ri ≢ ±rj (mod p) for all ij with 1 ≤ i, jt. Thus |ri|, i = 1, . . . , t, are precisely (a permuted version of) the integers 1, . . . , t. Thus . Canceling t! and using Proposition 2.21(1) gives the desired result.

Definition 2.31.

Let . The largest integer smaller than or equal to x is called the floor of x and is denoted by ⌊x⌋. Similarly, the smallest integer larger than or equal to x is called the ceiling of x and is denoted by ⌈x⌉.

Corollary 2.7.

With the notations of Lemma 2.4 we have (mod 2). If a is odd, then (mod 2). In particular, , that is, 2 is a quadratic residue mod p if and only if p ≡ ±1 (mod 8).

Proof

Since is even , it follows that if rj > 0, then is even, and if rj < 0, then is odd. Therefore, (mod 2).

If a is odd, p + a is even. Also 4 is a quadratic residue modulo p. So , where (mod 2) and (mod 2). Putting a = 1 gives and, therefore, , that is, .

Theorem 2.18. Law of quadratic reciprocity

Let p and q be distinct odd primes. Then .

Proof

By Corollary 2.7, , where , , s = (q – 1)/2 and t = (p – 1)/2. So we are done, if we can show that m + n = st. Consider the set S := {(x, y) | 1 ≤ xs, 1 ≤ yt} of cardinality st. Now S is the disjoint union of S1 and S2, where and . (Note that we cannot have px = qy.) It is easy to see that #S1 = m and #S2 = n.

To demonstrate how we can use the results deduced so far, let us compute . Since 360 = 23 · 32 · 5, we have

Thus 360 is a quadratic residue modulo 997. The apparent attractiveness of this method is beset by the fact that it demands the factorization of several integers and as such does not lead to a practical algorithm. We indeed need further machinery in order to have an efficient algorithm. First, we define a generalization of the Legendre symbol.

Definition 2.32.

Let a, b be integers with b > 0 and odd. We define the Jacobi symbol as

where, in the last case, p1, . . . , pt are all the prime factors of b (not necessarily all distinct).

Note that if , then a is not a quadratic residue mod b. However, the converse is not always true, that is, does not necessarily imply that a is a quadratic residue modulo b (Example: a = 2 and b = 9). Of course, if b is an odd prime and if gcd(a, b) = 1, the Legendre and Jacobi symbols correspond to the same value and meaning.

The Jacobi symbol enjoys many properties similar to the Legendre symbol.

Proposition 2.22.

For integers a, a′ and positive odd integers b, b′, we have:

  1. ,

  2. , and

  3. if aa′ (mod b), then . In particular, if r is the remainder of Euclidean division of a by b, then .

Proof

Immediate from the definition and Proposition 2.21.

Theorem 2.19.
  1. For an odd positive integer b

  2. If a is another odd positive integer with gcd(a, b) = 1, then

Proof

  1. Let b = p1 · · · ps, where pi are odd primes (not necessarily distinct). Then by definition , where . Now for odd integers x and y one has (mod 2). Repeated applications of this prove that (mod 2). To prove that , we proceed in a similar manner and note that for odd integers x and y one has (mod 2).

  2. If with odd primes, then by definition

    where from Theorem 2.18 it follows that

Now, we can calculate without factoring as follows.

2.5.4. Some Assorted Topics

So far, we have studied some elementary properties of integers. Number theory is, however, one of the oldest and widest branches of mathematics. Various complex-analytic and algebraic tools have been employed to derive more complicated properties of integers. In Section 2.13, we give a short introductory exposition to algebraic number theory. Here, we mention a collection of useful results from analytic number theory. The proofs of these analytic results would lead us too far away and hence are omitted here. Inquisitive (and/or cynical) readers may consult textbooks on analytic number theory for the details missing here.

The prime number theorem

The famous prime number theorem gives an asymptotic estimate of the density of primes smaller than or equal to a positive real number. Gauss conjectured this result in 1791. Many mathematicians tried to prove it during the 19th century and came up with partial results. Riemann made reasonable progress towards proving the theorem, but could not furnish a complete proof before he died in 1866. It is interesting to mention here that a good portion of the theory of analytic functions (also called holomorphic functions) in complex analysis was developed during these attempts to prove the prime number theorem. The first complete proof of the theorem (based mostly on the ideas of Riemann and Chebyshev) was given independently by the French mathematician Hadamard and by the Belgian mathematician de la Vallée Poussin in 1896. Their proof is regarded as one of the major achievements of modern mathematics. People started believing that any proof of the prime number theorem has to be analytic. Erdös and Selberg destroyed this belief by independently providing the first elementary proof of the theorem in 1949. Here (and elsewhere in mathematics), the adjective elementary refers to something which does not depend on results from analysis or algebra. Caution: Elementary is not synonymous with easy !

Theorem 2.20. Prime Number Theorem

Let π(x) denote the number of primes less than or equal to a real number x > 0. As x → ∞ we have π(x) → x/ln x (that is, the ratio π(x)/(x/ln x) → 1). In particular, for the density π(n)/n of primes among the natural numbers ≤ n asymptotically approaches 1/ ln n as n → ∞. It also follows that the n-th prime is approximately equal to n ln n.

Though the prime number theorem provides an asymptotic estimate (that is, one for x → ∞), for finite values of x (for example, for the values of x in the cryptographic range) it does give good approximations for π(x). Table 2.1 lists π(x) against the rounded values of x/ ln x for x equal to small powers of 10.

Table 2.1. Approximations to π(x)
xπ(x)x/ ln xx/(ln x – 1)Li(x)
103168145169178
1041229108612181246
1059592868695129630
10678,49872,38278,03078,628
107664,579620,421661,458664,918
1085,761,4555,428,6815,740,3045,762,209

Given the prime number theorem it follows that π(x) approaches x/(ln x – ξ) for any real ξ. It turns out that ξ = 1 is the best choice. Gauss’ Li function is also an asymptotic estimate for π(x), where for real x > 0 one defines:

Gauss conjectured that Li(x) asymptotically equals π(x). The prime number theorem is, in fact, equivalent to this conjecture. Furthermore, de la Vallée Poussin proved that Li(x) is a better approximation to π(x) than x/(ln x – ξ) for any real ξ. Table 2.1 also lists x/(ln x – 1) and Li(x) against the actual values of π(x).

The asymptotic formula does not rule out the possibility that the error π(x)–(x/ ln x) tends to zero as x → ∞. It has been shown by Dusart [83] that (x/ ln x) – 0.992(x/ ln2 x) ≤ π(x) ≤ (x/ ln x) + 1.2762(x/ ln2 x) for all x > 598.

Density of smooth integers

Integers having only small prime divisors play an interesting role in cryptography and in number theory in general.

Definition 2.33.

Let . An integer x is called y-smooth (or simply smooth, if y is understood from the context), if all the prime divisors of x are ≤ y. We denote by ψ(x, y) the fraction of positive integers ≤ x, that are y-smooth.

The following theorem gives an asymptotic estimate for ψ(x, y).

Theorem 2.21.

Let x, with x > y and let u := ln x/ ln y. For u → ∞ and y ≥ ln2 x we have the asymptotic formula:

ψ(x, y) → uu+o(u) = e–[(1+o(1))u ln u].

In Theorem 2.21, the notation g(u) = o(f(u)) implies that the ratio g(u)/f(u) tends to 0 as u approaches ∞. See Definition 3.1 for more details. An interesting special case of the formula for ψ(x, y) will be used quite often in this book and is given as Corollary 4.1 in Chapter 4.

Like the prime number theorem, Theorem 2.21 gives only asymptotic estimates, but is indeed a good approximation for finite values of x, y and u (that is, for the values of practical interest). The most important implication of this theorem is that the density of y-smooth integers in the set {1, . . . , x} is a very sensitive function of u = ln x/ ln y and decreases very rapidly as x increases. For example, if y = 15,485,863, the millionth prime, then a random integer ≤ 2250 is y-smooth with probability approximately 2.12 × 10–11, whereas a random integer ≤ 2500 is y-smooth with probability approximately 2.23 × 10–28. (These figures are computed neglecting the o(u) term in the expression of ψ(x, y).) In other words, smaller integers have higher probability of being smooth (that is, y-smooth for a given y).

The extended Riemann hypothesis

The Riemann hypothesis (RH) is one of the deepest unsolved problems in mathematics. An extended version of this hypothesis has important bearings on the solvability of certain computational problems in polynomial time.

Definition 2.34.

The Euler zeta function ζ(s) is defined for a complex variable s with Re s ≥ 1 as

The reader may already be familiar with the results: ζ(1) = ∞, ζ(2) = π2/6 and ζ(4) = π4/90. Riemann (analytically) extended the Euler Zeta function for all complex values of s (except at s = 1, where the function has a simple pole). This extended function, called the Riemann zeta function, is known to have zeros at s = –2, –4, –6, . . . . These are called the trivial zeros of ζ(s). It can be proved that all non-trivial zeros of ζ(s) must lie in the so-called critical strip : 0 ≤ Re s ≤ 1, and are symmetric about the critical line: Re s = 1/2.

Conjecture 2.1. Riemann hypothesis (RH)

All non-trivial zeros of ζ(s) lie on the critical line.

In 1900, Hilbert asserted that proving or disproving the RH is one of the most important problems confronting 20th century mathematicians. The problem continues to remain so even to the 21st century mathematicians.

In 1901, von Koch proved that the RH is equivalent to the formula:

Conjecture 2.2. An equivalent form of the Riemann hypothesis

π(x) = Li(x) + O(x1/2 ln x)

Here the order notation f(x) = O(g(x)) means that |f(x)/g(x)| is less than a constant for all sufficiently large x (See Definition 3.1).

Hadamard and de la Vallée Poussin proved that

for some positive constant α. While this estimate was sufficient to prove the prime number theorem, the tighter bound of Conjecture 2.2 continues to remain unproved.

Theorem 2.22. Dirichlet’s theorem on primes in arithmetic progression

Let a, be coprime. The set contains an infinite number of primes.

Dirichlet’s theorem is a powerful generalization of Theorem 2.12 (which corresponds to a = b = 1). One can accordingly generalize the notation π(x) as follows:

Definition 2.35.

Let a, with gcd(a, b) = 1. By πa, b(x), we denote the number of primes in the set , that are ≤ x.

The prime number theorem gives the estimate:

where φ is Euler’s totient function. The RH now generalizes to:

Conjecture 2.3. Extended Riemann hypothesis (ERH)

For a, with gcd(a, b) = 1,

Some authors use the expression Generalized Riemann hypothesis (GRH) in place of ERH. Taking b = 1 demonstrates that the ERH implies the RH. The ERH also implies the following:

Conjecture 2.4.

The smallest positive quadratic non-residue modulo a prime p is < 2 ln2 p.

Exercise Set 2.5

2.35
  1. Show that any integer n ≥ 3 satisfies n2 = a2b2 for some a, .

  2. Show that for any integer n ≥ 2 the integer n4 + 4n is composite.

2.36Let and S a subset of {1, 2, ..., 2n} of cardinality n + 1. Show that: [H]
  1. There exist x, such that xy = 1.

  2. There exist x, such that xy = n.

  3. There exist distinct x, such that x is a multiple of y.

  4. There exist distinct x, such that x is relatively prime to y.

2.37Show that for any , n > 1, the rational number is not an integer. [H]
2.38
  1. Show that the Mersenne number Mn := 2n – 1 is prime only if n is prime.

  2. Show that the Fermat number 2n + 1 is prime only if n = 2t for some .

2.39Let n ≥ 2 be a natural number. A complete residue system modulo n is a set of n integers a1, . . . , an such that aiaj (mod n) for ij. Similarly, a reduced residue system modulo n is a set of φ(n) integers b1, . . . , bφ(n) such that gcd(bi, n) = 1 for all i = 1, . . . , φ(n) and bibj (mod n) for ij. Show that:
  1. If {a1, . . . , an} is a complete residue system modulo n, the equivalence classes of a1, . . . , an (modulo the ideal ) constitute the set . In other words, given any integer a, there exists a unique i, 1 ≤ in, for which aai (mod n).

  2. If {b1, . . . , bφ(n)} is a reduced residue system modulo n, then the equivalence classes of b1, . . . , bφ(n) constitute the set . In other words, given any integer b coprime to n, there exists a unique i, 1 ≤ i ≤ φ(n), for which bbi (mod n).

  3. If {a1, . . . , an} is a complete residue system modulo n, then for any integer a coprime to n, the integers aa1, . . . , aan constitute a complete residue system modulo n. For example, if n is odd, then {2, 4, 6, . . . , 2n} is a complete residue system modulo n.

  4. If {b1, . . . , bφ(n)} is a reduced residue system modulo n, then for any integer b coprime to n, the integers bb1, . . . , bbφ(n) constitute a reduced residue system modulo n.

  5. For n > 2, the integers 12, 22, . . . , n2 do not constitute a complete residue system modulo n. [H]

  6. If p is an odd prime and if {a1, . . . , ap} and are two complete residue systems modulo p, then is not a complete residue system modulo p. [H]

2.40Prove that the decimal expansion of any rational number a/b is recurring, that is, (eventually) periodic. (A terminating expansion may be viewed as one with recurring 0.) [H]
2.41Let p be an odd prime. Show that the congruence x2 ≡ –1 (mod p) is solvable if and only if p ≡ 1 (mod 4). [H]
2.42Let .
  1. Show that if n > 2, then φ(n) is even.

  2. Show that if n is odd, then φ(n) = φ(2n).

  3. Find out all the values of n for which φ(n) = 12.

2.43For , show that .
2.44Let n > 2 and gcd(a, n) = 1. Let h be the multiplicative order of a modulo n (that is, in the group ). Show that:
  1. aiaj (mod n) if and only if ij (mod h).

  2. The multiplicative order of al modulo n is h/ gcd(h, l).

  3. If a is a primitive element of (that is, if h = φ(n)), then 1, a, a2, . . . , ah–1 is a reduced residue system modulo n.

  4. If gcd(b, n) = 1 and b has multiplicative order k modulo n and if gcd(h, k) = 1, then the multiplicative order of ab modulo n is hk.

2.45Device a criterion for the solvability of ax2 + bx + c ≡ 0 (mod p), where p is an odd prime and gcd(a, p) = 1. [H]
2.46Let p be a prime and . An integer a with gcd(a, p) = 1 is called an r-th power residue modulo p, if the congruence xra (mod p) has a solution. Show that a is an r-th power residue modulo p if and only if a(p–1)/ gcd(r, p–1) ≡ 1 (mod p). This is a generalization of Euler’s criterion for quadratic residues.
2.47Let G be a finite cyclic group of cardinality n. Show that and that there are exactly φ(n) generators (that is, primitive elements) of G.
2.48Let m, with m|n. Show that the canonical (surjective) ring homomorphism induces a surjective group homomorphism of the respective groups of units. (Note that every ring homomorphism induces a group homomorphism , where A* and B* are the groups of units of A and B respectively. Even when is surjective, need not be surjective, in general. As an example consider the canonical surjection for a prime p > 3.)
2.49In this exercise, we investigate which of the groups is cyclic for a prime p and .
  1. Show that and are cyclic, but is not cyclic. Conclude that is not cyclic for e ≥ 3. [H] More specifically, show that for e ≥ 3 the multiplicative group is the direct product of two cyclic subgroups generated by –1 and 5 respectively.

  2. Show that if p is an odd prime and , then is cyclic. [H]

2.50Show that the multiplicative group , n ≥ 2, is cyclic if and only if n = 2, 4, pe, 2pe, where p is an odd prime and . [H]

2.6. Polynomials

Unless otherwise stated, in this section we denote by K an arbitrary field and by K[X] the ring of polynomials in one indeterminate X and with coefficients from K. Since K[X] is a PID, it enjoys many properties similar to those of . To start with, we take a look at these properties. Then we introduce the concept of algebraic elements and discuss how irreducible polynomials can be used to construct (algebraic) extensions of fields. When no confusions are likely, we denote a polynomial by f only.

2.6.1. Elementary Properties

Since K[X] is a PID and hence a UFD, every polynomial in K[X] can be written essentially uniquely as a product of prime polynomials. Conventionally prime polynomials are more commonly referred to as irreducible polynomials. Similar to the case of the ring K[X] contains an infinite number of irreducible elements, for if K is infinite, then is an infinite set of irreducible polynomials of K[X], and if K is finite, then as we will see later, there is an irreducible polynomial of degree d in K[X] for every .

It is important to note here that the concept of irreducibility of a polynomial is very much dependent on the field K. If KL is a field extension, then a polynomial in K[X] is naturally an element of L[X] also. A polynomial which is irreducible over K need not continue to remain so over L. For example, the polynomial x2 – 2 is irreducible over , but reducible over , since , being a real number but not a rational number. As a second example, the polynomial x2 + 1 is irreducible over both and but not over . In fact, we will show shortly that an irreducible polynomial in K[X] of degree > 1 becomes reducible over a suitable extension of K.

For polynomials f(X), with g(X) ≠ 0, there exist unique polynomials q(X) and r(X) in K[X] such that f(X) = q(X)g(X) + r(X) with r(X) = 0 or deg r(X) < deg g(X). The polynomials q(X) and r(X) are respectively called the quotient and remainder of polynomial division of f(X) by g(X) and can be obtained by the so-called long division procedure. We use the notations: q(X) = f(X) quot g(X) and r(X) = f(X) rem g(X).

Whenever we talk about the gcd of two non-zero polynomials, we usually refer to the monic gcd, that is, a polynomial with leading coefficient 1. This makes the gcd of two polynomials unique. We have gcd(f(X), g(X)) = gcd(g(X), r(X)), where r(X) = f(X) rem g(X). This gives rise to an algorithm (similar to the Euclidean gcd algorithm for integers) for computing the gcd of two polynomials. Bézout relations also hold for polynomials. More specifically:

Proposition 2.23.

Let f(X), , not both zero, and d(X) the (monic) gcd of f(X) and g(X). Then there are polynomials u(X), such that d(X) = u(X)f(X) + v(X)g(X). (Such an equality is called a Bézout relation.) Furthermore, if f(X) and g(X) are non-zero and not both constant, then u(X) and v(X) can be so chosen that deg u(X) < deg g(X) and deg v(X) < deg f(X).[6]

[6] Recall that the degree of the zero polynomial is taken to be –∞.

Proof

Similar to the proof of Proposition 2.16.

The concept of congruence can be extended to polynomials, namely, if , then two polynomials g(X), are said to be congruent modulo f(X), denoted g(X) ≡ h(X) (mod f(X)), if f(X)|(g(X) – h(X)), that is, if there exists with g(X) – h(X) = u(X)f(X), or equivalently, if g(X) rem f(X) = h(X) rem f(X).

The principal ideals 〈f(X)〉 of K[X] play an important role (as do the ideals 〈n〉 of ). Let us investigate the structure of the quotient ring R := K[X]/〈f(X)〉 for a non-constant polynomial . If r(X) denotes the remainder of division of by f(X), then it is clear that the residue classes of g(X) and r(X) are the same in R. On the other hand, two polynomials g(X), with deg g(X) < deg f(X) and deg h(X) < deg f(X) represent the same residue class in R if and only if g(X) = h(X). Thus elements of R are uniquely representable as polynomials of degrees < deg f(X). In other words, we may represent the ring R as the set together with addition and multiplication modulo the polynomial f(X). The ring R contains all the constant polynomials , that is, the field K is canonically embedded in R. In general, R is not a field. The next theorem gives the criterion for R to be a field.

Theorem 2.23.

For a non-constant polynomial , the ring K[X]/〈f(X)〉 is a field if and only if f(X) is irreducible in K[X].

Proof

If f(X) is reducible over K, then we can write f(X) = g(X)h(X) for some polynomials g(X), with 1 ≤ deg g < deg f and 1 ≤ deg h < deg f. Then both g and h represent non-zero elements in K[X]/〈f(X)〉, whose product is 0, that is, K[X]/〈f(X)〉 has non-zero zero divisors.

Conversely, if f(X) is irreducible over K and if g(X) is a non-zero polynomial of degree < deg f(X), then gcd(f(X), g(X)) = 1, so that by Proposition 2.23 there exist polynomials u(X), with u(X)f(X) + v(X)g(X) = 1 and deg v(X) < deg f(X). Thus we see that v(X)g(X) ≡ 1 (mod f(X)), that is, g(X) has a multiplicative inverse modulo f(X).

Let L := K[X]/〈f(X)〉 with f(X) irreducible over K. Then KL is a field extension. If deg f(X) = 1, then L is isomorphic to K. If deg f(X) ≥ 2, then L is a proper extension of K. This gives us a useful and important way of representing the extension field L, given a representation for K. (For example, see Section 2.9.)

2.6.2. Roots of Polynomials

The study of the roots of a polynomial is the central objective in algebra. We now derive some elementary properties of roots of polynomials.

Definition 2.36.

Let . An element is said to be a root of f, if f(a) = 0.

Proposition 2.24.

Let and . Then f(X) = (Xa)q(X) + f(a) for some . In particular, a is a root of f(X) if and only if Xa divides f(X).

Proof

Polynomial division of f(X) by Xa gives f(X) = (Xa)q(X) + r(X) with deg r(X) < deg(Xa) = 1. Thus r(X) is a constant polynomial. Let us denote r(X) by . Substituting X = a gives f(a) = r.

Proposition 2.25.

A non-zero polynomial with d := deg f can have at most d roots in K.

Proof

We proceed by induction on d. The result clearly holds for d = 0. So assume that d ≥ 1 and that the result holds for all polynomials of degree d – 1. If f has no roots in K, we are done. So assume that f has a root, say, . By Proposition 2.24, we have f(X) = (Xa)g(X) for some . Clearly, deg g = d – 1 and so by the induction hypothesis g has at most d – 1 roots. Since K is a field (and hence does not contain non-zero zero divisors), it follows that the roots of f are precisely a and the roots of g. This establishes the induction step.

In the last proof, the only result we have used to exploit the fact that K is a field is that K contains no non-zero zero divisors. This is, however, true for every integral domain. Thus Proposition 2.25 continues to hold if K is any integral domain (not necessarily a field). However, if K is not an integral domain, the proposition is not necessarily true. For example, if ab = 0 with a, , ab, then the polynomial X2 + (ba)X has at least three roots: 0, a and ab.

For a field extension KL and for a polynomial , we may think of the roots of f in L, since too. Clearly, all the roots of f in K are also roots of f in L. However, the converse is not true in general. For example, the only roots of X4 – 1 in are ±1, whereas the roots of the same polynomial in are ±1, ±i. Indeed we have the following important result.

Proposition 2.26.

For any non-constant polynomial , there exists a field extension K′ of K such that f has a root in K′.

Proof

If f has a root in K, taking K′ = K proves the proposition. So we assume that f has no root in K (which implies that deg f ≥ 2). In principle, we do not require f to be irreducible. But if we consider a non-constant factor g of f, irreducible over K, we see that the roots of g in any extension L of K are roots of f in L too. Thus we may replace f by g and assume, without loss of generality, that f is irreducible. We construct the field extension K′ := K[X]/〈f〉 of K and denote the equivalence class of X in K′ by α. (One also writes x, X or [X] to denote this equivalence class.) It is clear that , that is, α is a root of f(X) in K′.

We say that the field K′ in the proof of the last proposition is obtained by adjoining the root α of f and denote this as K′ = K(α). We can write f(X) = (X – α)f1(X), where and deg f1 = (deg f) – 1. Now there is a field extension K″ of K′, where f1 has a root. Proceeding in this way we prove the following result.

Proposition 2.27.

A non-constant polynomial f in K[X] with deg f = d has d roots (not necessarily all distinct) in some field extension L of K.

If a polynomial of degree d ≥ 1 has all its roots α1, . . . , αd in L, then f(X) = a(X – α1) · · · (X – αd) for some (actually ). In this case, we say that f splits (completely or into linear factors) over L.

Definition 2.37.

Let be a non-constant polynomial. A minimal (with respect to inclusion) field extension of K, over which f splits completely is called a splitting field of f over K.[7] This is a minimal field which contains K and all the roots of f.

[7] It is necessary to use the phrase “over K” in this definition. X2 + 1, treated as a polynomial in , has the splitting field , whereas the same polynomial, treated as an element of , has the splitting field (see Equation (2.3) on p 74).

Every non-constant polynomial has a splitting field L over K. Quite importantly, this field L is unique in some sense. This allows us to call the splitting field of f instead of a splitting field of f. We discuss these topics further in Section 2.8.

Definition 2.38.

Let f be a non-constant polynomial in K[X] and let α be a root of f (in some extension of K). The largest natural number n for which (X –α)n|f(X) is called the multiplicity of the root α (in f). If n = 1 (resp. n > 1), then α is called a simple (resp. multiple) root of f. If all the roots of f are simple, then we call f a square-free polynomial. It is easy to see that f is square-free, only if f is not divisible by the square of a non-constant polynomial in K[X]. The reverse implication also holds, if char K = 0 or if K is a finite field (or, more generally, if K is a perfect field—see Exercise 2.76).

The notion of multiplicity can be extended to a non-root β of f by setting the multiplicity of β to zero.

2.6.3. Algebraic Elements and Extensions

Here we assume, unless otherwise stated, that KL is a field extension.

Definition 2.39.

An element is said to be algebraic over K, if there exists a non-constant polynomial with f(α) = 0. If an element is not algebraic over K, we say that α is transcendental over K. Thus a transcendental (over K) element is a root of no polynomials in K[X]. A field extension KL is called an algebraic extension, if every element of L is algebraic over K. A non-algebraic extension is also called a transcendental extension. If KL is a transcendental extension, there exists at least one element , which is transcendental (that is, not algebraic) over K.

Example 2.10.
  1. Every element is algebraic over K, since it is a root of the non-constant polynomial .

  2. The element is algebraic over , since α is a root of the polynomial .

  3. The well-known real numbers e and π are transcendental over . (We are not going to prove this.) Of course, the concept of algebraic and transcendental elements is heavily dependent on the field K. For example, e and π, being elements of , are algebraic over .

  4. A complex number , where and a, , is a root of the polynomial and hence is algebraic over . Therefore, the field extension is algebraic.

  5. The extension is transcendental, since contains elements (like e and π) that are transcendental over .

Definition 2.40.

Let be algebraic over K. A non-constant polynomial of least positive degree with f(α) = 0 is called a minimal polynomial of α over K.

Proposition 2.28.

Let be algebraic over K. A minimal polynomial f of α over K is irreducible over K. If is a polynomial with h(α) = 0, then f|h. In particular, any two minimal polynomials f and g of α satisfy g(X) = cf(X) for some .

Proof

Let f = f1f2 for some non-constant polynomials f1, . Since K is a field and 0 = f(α) = f1(α)f2(α), we have f1(α) = 0 or f2(α) = 0. But deg f1 < deg f and deg f2 < deg f, a contradiction to the choice of f.

Using polynomial division one can write h(X) = q(X)f(X) + r(X) for some polynomials q, . Now h(α) = 0 implies r(α) = 0. Since deg r < deg f, by the choice of f we must then have r(X) = 0, that is, f|h.

Finally, if f and g are two minimal polynomials of α over K, then f|g and g|f and it follows that g(X) = cf(X) for some unit c of K[X]. But the only units of K[X] are the non-zero elements of K.

By Proposition 2.28, a monic minimal polynomial f of α over K is uniquely determined by α and K. It is, therefore, customary to define the minimal polynomial of α over K to be this (unique) monic polynomial. Unless otherwise stated, we will stick to this revised definition and write f(X) = minpolyα, K(X).

Example 2.11.
  1. For , we have minpolyα, K(X) = X – α.

  2. A complex number z = a+ib, a, , b ≠ 0, is not a root of a linear polynomial over , but is a root of the quadratic polynomial . Therefore, , that is, f is irreducible over .

Proposition 2.29.

For a field K, the following conditions are equivalent.

  1. Every proper field extension KL is transcendental (that is, K has no algebraic extensions other than itself).

  2. Every non-constant polynomial in K[X] has a root in K.

  3. Every non-constant polynomial in K[X] splits in K.

  4. Every non-constant irreducible polynomial in K[X] is of degree 1.

Proof

[(a)⇒(b)] Consider a non-constant irreducible polynomial and the field extension L = K[X]/〈f〉 of K. We have seen that L contains a root of f. We will prove in Section 2.8 that such an extension is algebraic (Corollary 2.11). Hence (a) implies that L = K, that is, K contains a root of f.

[(b)⇒(c)] Let be a non-constant polynomial. By (b), f has a root, say, . Thus f(X) = (X – α1)f1(X) for some with deg f1 = (deg f) – 1. If f1 is a constant polynomial, we are done. Otherwise, we find as above and with f1(X) = (X – α2)f2(X) and with deg f2 = (deg f) – 2. Proceeding in this way proves (c).

[(c)⇒(d)] Obvious.

[(d)⇒(a)] Let be algebraic over K and let . Since f is irreducible, by (d) deg f = 1, that is, f(X) = X – α, that is, .

Definition 2.41.

A field K satisfying the equivalent conditions of Proposition 2.29 is called an algebraically closed field. For an arbitrary field K, a minimal algebraically closed field containing K is called an algebraic closure of K.

We will see in Section 2.8 that an algebraic closure of every field exists and is unique in some sense. The algebraic closure of an algebraically closed field K is K itself. We end this section with the following well-known theorem. We will not prove the theorem in this book, because every known proof of it uses some kind of complex analysis which this book does not deal with.

Theorem 2.24. Fundamental theorem of algebra

The field is algebraically closed.

is not algebraically closed, since the proper extension is algebraic (See Example 2.10). Indeed, is the algebraic closure of .

Exercise Set 2.6

2.51Let R be a ring and f, . Show that:
  1. deg(f + g) ≤ max(deg f, deg g) with equality holding, if deg f ≠ deg g.

  2. deg(f g) ≤ deg f + deg g with equality holding, if R is an integral domain.

  3. If R is an integral domain, then R[X] is an integral domain too. More generally, if R is an integral domain, then R[X1, . . . , Xn] is also an integral domain for all .

2.52Let f, , where R is an integral domain. Show that if f(ai) = g(ai) for i = 1, . . . , n, where n > max(deg f, deg g) and where a1, . . . , an are distinct elements of R, then f = g. In particular, if f(a) = g(a) for an infinite number of , then f = g.
2.53

Lagrange’s interpolation formula Let K be a field and let a0, . . . , an be distinct elements of K. Show that for (not necessarily all distinct), there exists a unique polynomial of degree ≤ n such that f(ai) = bi for all i = 0, . . . , n. [H]

2.54

Polynomials over a UFD Let R be a UFD. For a non-zero polynomial , a gcd of the coefficients of f is called a content of f and is denoted by cont f. One can then write f = (cont f)f1, where with cont . f1 is called a primitive part of f and is often denoted as pp f. It is clear that cont f and pp f are unique up to multiplication by units of R. If for a non-zero polynomial the content cont (or, equivalently, if f and pp f are associates), then f is called a primitive polynomial. Show that for two non-zero polynomials f, the elements cont(f g) and (cont f)(cont g) are associates in R. In particular, the product of two primitive polynomials is again primitive.

2.55Let R be a UFD. Show that a non-constant polynomial is irreducible over R if and only if f is irreducible over Q(R), where Q(R) denotes the quotient field of R (see Exercise 2.34).
2.56
  1. Eisenstein’s criterion Let R be a UFD and with an ≠ 0. Suppose that there is a prime such that p does not divide an, p divides ai for all i, 0 ≤ in – 1, and p2 does not divide a0. Show that f is irreducible over R.

  2. As an application of Eisenstein’s criterion show that for a prime the polynomial Xp–1 + · · · + X + 1 is irreducible in . [H]

2.57Let KL be a field extension and f1, . . . , fn non-constant polynomials in K[X]. Show that each fi, i = 1, . . . , n, splits over L if and only if the product f1 · · · fn splits over L.
2.58Show that the irreducible polynomials in have degrees ≤ 2. [H]
2.59Show that a finite field (that is, a field with finite cardinality) is not algebraically closed. In particular, the algebraic closure of a finite field is infinite.
2.60A complex number z is called an algebraic number, if z is algebraic over . An algebraic number z is called an algebraic integer, if z is a root of a monic polynomial in . Show that:
  1. If z is an algebraic number, then mz is an algebraic integer for some .

  2. If is an algebraic integer, then .

  3. If is an algebraic integer, then for any integer the complex numbers nz and z + n are algebraic integers.

2.61Let K be a field and . The formal derivative f′ of f is defined to be the polynomial . Show that:
  1. (f + g)′ = f′ + g′ and (f g)′ = fg + f g′ for any f, .

  2. If char K = 0, then f′ = 0 if and only if .

  3. If char K = p > 0, then f′ = 0 if and only if f(X) = g(Xp) for some .

  4. f (≠ 0) has no multiple roots (in any extension field of K), that is, f is square-free, if and only if gcd(f, f′) = 1.

  5. Let f be a (non-constant) irreducible polynomial over K. Show that if char K = 0, then f has no multiple roots. On the other hand, if char K = p > 0, show that f has multiple roots if and only if f(X) = g(Xp) for some . (However, if , then by Fermat’s little theorem g(Xp) = g(X)p, which contradicts the fact that f(x) is irreducible. Therefore, f cannot have multiple roots.)

2.62Let be a non-constant polynomial of degree d and let α1, . . . , αd be the roots of f (in some extension field of K). The quantity is called the discriminant of f. Prove the following assertions:
  1. Δ(f) = 0 if and only if f has a multiple root.

  2. .

  3. Δ(X2 + aX + b) = a2 – 4b.

  4. Δ(X3 + aX + b) = –(4a3 + 27b2).

2.7. Vector Spaces and Modules

Vector spaces and linear transformations between them are the central objects of study in linear algebra. In this section, we investigate the basic properties of vector spaces. We also generalize the concept of vector spaces to get another useful class of objects called modules. A module which also carries a (compatible) ring structure is referred to as an algebra. Study of algebras over fields (or more generally over rings) is of importance in commutative algebra, algebraic geometry and algebraic number theory.

2.7.1. Vector Spaces

Unless otherwise specified, K denotes a field in this section.

Definition 2.42.

A vector space V over a field K (or a K-vector space, in short) is an (additively written) Abelian group V together with a multiplication map · : K × VV called the scalar multiplication map, such that the following properties are satisfied by every a, and x, .

  1. a · (x + y) = a · x + a · y,

  2. (a + b) · x = a · x + b · x,

  3. 1 · x = x,

  4. a · (b · x) = (ab) · x,

where ab denotes the product of a and b in the field K. When no confusions are likely, we omit the scalar multiplication sign · and write a · x simply as ax.

Example 2.12.
  1. Any field K is trivially a K-vector space with the scalar multiplication being the same as the field multiplication. More generally, if KL is a field extension, then L is a K-vector space.

  2. For , the product Kn = K × · · · × K (n factors) is a K-vector space under the scalar multiplication map a(x1, . . . , xn) := (ax1, . . . , axn). For arbitrary K-vector spaces V1, . . . , Vn, we can analogously define the product V1 × · · · × Vn.

  3. The polynomial ring K[X] (or K[X1, . . . , Xn]) is a K-vector space (with the natural scalar multiplication).

Corollary 2.8.

Let V be a K-vector space. For every and , we have:

  1. 0 · x = 0.

  2. a · 0 = 0.

  3. (–a) · x = a · (–x) = –(a · x).

Proof

Easy verification.

Definition 2.43.

Let V be a vector space over K and S a subset of V. We say that S is a generating set or a set of generators of V (over K), or that S generates V (over K), if every element can be written as a finite linear combination x = a1x1 + · · · + anxn for some (depending on x) and with and for 1 ≤ in. A generating set S of V is called minimal, if no proper subset of S generates V. If V has a finite generating set, then V is called finitely generated or finite-dimensional.

Example 2.13.
  1. Consider the field extension L := K[X]/〈f(X)〉 of K, where f is an irreducible polynomial in K[X] of degree n. If α denotes the equivalence class of X in L, then every element of L can be written as an–1αn–1 + · · · + a1α + a0 with for 0 ≤ in – 1. Thus {1, α, . . . , αn–1} is a generating set of L over K. In particular, L is finitely generated over K.

  2. The K-vector space Kn is generated by the unit vectors ei, 1 ≤ in, defined as ei := (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position). Thus Kn is also finitely generated over K.

  3. {1, X, X2, · · ·} is an infinite generating set of the polynomial ring K[X] regarded as a K-vector space. K[X] is not finitely generated over K.

    It is not difficult to show that the generating sets discussed in these examples are minimal.

Definition 2.44.

A subset S of a K-vector space V is called linearly independent (over K), if whenever a1x1 + · · · + anxn = 0 for some , and , 1 ≤ in, we have a1 = · · · = an = 0. If S is not linearly independent, it is called linearly dependent. If S is linearly independent (resp. dependent), then we also say that the elements of S are linearly independent (resp. dependent). A maximal linearly independent subset of V is a linearly independent subset SV with the property that S ∪ {x} is linearly dependent for any .

If , then S is linearly dependent, since a · 0 = 0 for any . One can easily check that all the generating sets of Example 2.13 are linearly independent too. This is, however, not a mere coincidence, as the following result demonstrates.

Theorem 2.25.

A subset S of a K-vector space V is a minimal generating set for V if and only if S is a maximal linearly independent set of V.

Proof

[if] Given a maximal linearly independent subset S of V, we first show that S is a generating set for V. Take any non-zero . By the maximality of S, the set S ∪ {x} is linearly dependent, that is, there exists a linear relation of the form a0x + a1x1 + · · · + anxn = 0, , , with some ai ≠ 0. The linear independence of S forces a0 ≠ 0 and so is a finite linear combination of elements of S. Thus S generates V. Now, we show that S is minimal. Assume otherwise, that is, S′ := S \ {y} generates V for some . Since S is linearly independent, y ≠ 0. For some , , , we then have y = b1y1 + · · · + bmym, a contradiction to the linear independence of S.

[only if] Given a minimal generating set S of V, we first show that S is linearly independent. Assume not, that is, a1x1 + · · · + anxn = 0 for some , and with some ai, say a1, non-zero. But then and, therefore, S \ {x1} also generates V, a contradiction to the minimality of S. Thus S is linearly independent. Now choose a non-zero . Since S generates V, we can write y = b1y1 + · · · + bmym, , and , that is, 1yb1y1 – · · · – bmym = 0, that is, S ∪ {y} is linearly dependent.

Definition 2.45.

Let V be a K-vector space. A minimal generating set S of V is called a basis of V over K (or a K-basis of V). By Theorem 2.25, S is a basis of V if and only if S is a maximal linearly independent subset of V. Equivalently, S is a basis of V if and only if S is a generating set of V and is linearly independent.

Any element of a vector space can be written uniquely as a finite linear combination of elements of a basis, since two different ways of writing the same element contradict the linear independence of the basis elements.

A K-vector space V may have many K-bases. For example, the elements 1, aX + b, (aX + b)2, · · · form a K-basis of K[X] for any a, , a ≠ 0. However, what is unique in any basis of a given K-vector space V is the cardinality[8] of the basis, as shown in Theorem 2.26.

[8] Two sets (finite or not) S1 and S2 are said to be of the same cardinality, if there exists a bijective map S1S2.

For the sake of simplicity, we sometimes assume that V is a finitely generated K-vector space. This assumption simplifies certain proofs greatly. But it is important to highlight here that, unless otherwise stated, all the results continue to remain valid without the assumption. For example, it is a fact that every vector space has a basis. For finitely generated vector spaces, this is a trivial statement to prove, whereas without our assumption we need to use arguments that are not so simple. (A possible proof follows from Exercise 2.63 with U = {0}.)

Theorem 2.26.

Let V be a K-vector space. Then any K-basis of V has the same cardinality.

Proof

We assume that V is finitely generated. Let S = {x1, . . . , xn} be a minimal finite generating set, that is, a basis, of V. Let T be another basis of V. Assume that m := #T > n. (We might even have m = ∞.) We can choose distinct elements . Note that xi and yj are non-zero. Now we can write y1 = a1x1 + · · · + anxn for some (unique) , with some ai ≠ 0. Renumbering x1, . . . , xn, if necessary, we may assume that a1 ≠ 0. Then . It follows that y1, x2, . . . , xn generate V. In particular, we can write y2 = b1y1 + b2x2 + · · · + bnxn, , with some bi ≠ 0. If b2 = · · · = bn = 0, then y1, y2 are linearly dependent, a contradiction. So bi ≠ 0 for some i, 2 ≤ in. Again we may renumber x2, . . . , xn, if necessary, to assume that b2 ≠ 0. Then , that is, y1, y2, x3, . . . , xn generate V. Proceeding in this way we can show that y1, . . . , yn generate V, a contradiction to the minimality of T as a generating set. Thus we must have mn. In particular, m is finite. Now reversing the roles of S and T we can likewise prove that nm.

Theorem 2.26 holds even when V is not finitely generated. We omit the proof for this case here.

Definition 2.46.

Let V be a K-vector space. The cardinality of any K-basis of V is called the dimension of V over K and is denoted by dimK V (or by dim V, if K is understood from the context). We call V finite-dimensional (resp. infinite-dimensional), if dimK V is finite (resp. infinite).

For example, dimK Kn = n, , and dimK K[X] = ∞.

Definition 2.47.

Let V be a K-vector space. A subgroup U of V, which is closed under the scalar multiplication of V, is again a K-vector space and is called a (vector) subspace of V. In this case, we have dimK U ≤ dimK V (Exercise 2.63).

Example 2.14.

Let V be a vector space over K.

  1. The subset {0} and V are trivially subspaces of V.

  2. Let S be any subset of V (not necessarily linearly independent). Then the set is a vector subspace of V. We say that U is spanned or generated by S, or that S generates or spans U, or that U is the span of S. This is often denoted by or by U = Span S. If S is linearly independent, then S is a basis of U.

Definition 2.48.

Let V and W be K-vector spaces. A map f : VW is called a homomorphism (of vector spaces) or a linear transformation or a linear map over K, if

f(ax + by) = af(x) + bf(y)

for all a, and x, . Equivalently, f is a linear map over K if and only if f(x + y) = f(x) + f(y) and f(ax) = af(x) for all and x, . The set of all K-linear maps VW is denoted by HomK(V, W). HomK(V, W) is a K-vector space under the definitions (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all f, and . A K-linear transformation VV is called a K-endomorphism of V. The set of all K-endomorphisms of V is denoted by EndK V. A bijective[9] homomorphism (resp. endomorphism) is called an isomorphism (resp. automorphism).

[9] As in Footnote 2, we continue to be lucky here: The inverse of a bijective linear transformation is again a linear transformation.

Theorem 2.27.

Let V and W be K-vector spaces. Then V and W are isomorphic if and only if dimK V = dimK W.

Proof

If dimK V = dimK W and S and T are bases of V and W respectively, then there exists a bijection f : ST. One can extend f to a linear map as , for , and . One can readily verify that is an isomorphism. Conversely, if g : VW is an isomorphism and S is any basis of V, then is clearly a basis of W.

Corollary 2.9.

A K-vector space V with n := dimK V < ∞ is isomorphic to Kn.

Let V be a K-vector space and U a subspace. As in Section 2.3 we construct the quotient group V/U. This group can be given a K-vector space structure under the scalar multiplication map a(x + U) := ax + U, , . If TV is such that the residue classes of the elements of T form a K-basis of V/U and if S is a K-basis of U, then it is easy to see that ST is a K-basis of V. In particular,

Equation 2.2


For , the set is called the kernel Ker f of f, and the set is called the image Im f of f. We have the isomorphism theorem for vector spaces:

Theorem 2.28. Isomorphism theorem

Ker f is a subspace of V, Im f is a subspace of W, and V/Ker f ≅ Im f.

Proof

Similar to Theorem 2.3 and Theorem 2.9.

Definition 2.49.

For , the dimension of Im f is called the rank of f and is denoted by Rank f, whereas the dimension of Ker f is called the nullity of f and is denoted by Null f. An immediate consequence of the isomorphism theorem and of Equation (2.2) is the following important result.

Theorem 2.29.

Rank f + Null f = dimK V for any .

*2.7.2. Modules

If we remove the restriction that K is a field and assume that K is any ring, then a vector space over K is called a K-module. More specifically, we have:

Definition 2.50.

Let R be a ring. A module over R (or an R-module) is an (additively written) Abelian group M together with a multiplication map · : R × MM called the scalar multiplication map, such that for every a, and x, we have a · (x + y) = a · x + a · y, (a + b) · x = a · x + b · x, 1 · x = x, and a · (b · x) = (ab) · x, where ab denotes the product of a and b in the ring R. When no confusions are likely, we omit the scalar multiplication sign · and write a · x as ax.

Example 2.15.
  1. Vector spaces are special cases of modules, when the underlying ring is a field.

  2. Ideals of R are modules over R with the ring multiplication map taken as the scalar multiplication.

  3. Every Abelian group G is a -module under the scalar multiplication

  4. The polynomial rings R[X] and R[X1, . . . , Xn] are modules over R.

  5. Let Mi, , be a family of R-modules. The direct product of Mi is defined as the set of all tuples indexed by I. The direct sum is the subset of the Cartesian product consisting only of the tuples for which ai = 0 except for a finite number of . Both the direct product and the direct sum are R-modules under component-wise addition and scalar multiplication. When I is finite, they are naturally the same.

Modules are a powerful generalization of vector spaces. Any result we prove for modules is equally valid for vector spaces, ideals and Abelian groups. On the other hand, since we do not demand that the ring R be necessarily a field, certain results for vector spaces are not applicable for all modules.

It is easy to see that Corollary 2.8 continues to hold for modules. An R-submodule of an R-module M is a subgroup of M, that is closed under the scalar multiplication of M. For a subset SM, the set of all finite linear combinations of the form a1x1 + · · · + anxn, , , , is an R-submodule N of M, denoted by RS or . We say that N is generated by S (or by the elements of S). If S is finite, then N is said to be finitely generated. A (sub)module generated by a single element is called cyclic. It is important to note that unlike vector spaces the cardinality of a minimal generating set of a module is not necessarily unique. (See Exercise 2.68 for an example.) It is also true that given a minimal generating set S of M, there may be more than one ways of writing an element of M as finite linear combinations of elements of S. For example, if and S = {2, 3}, then 1 = (–1)·2+1·3 = 2·2+(–1)·3. The nice theory of dimensions developed in connection with vector spaces does not apply to modules.

For an R-submodule N of M, the Abelian group M/N is given an R-module structure by the scalar multiplication map a(x + N) := ax + N. This module is called the quotient module of M by N.

For R-modules M and N, an R-linear map or an R-module homomorphism (from M to N) is defined as a map f : MN with f(ax+by) = af(x)+bf (y) for all a, and x, (or equivalently with f(x + y) = f(x) + f(y) and f(ax) = af(x) for all and x, ). An isomorphism, an endomorphism and an automorphism are defined in analogous ways as in case of vector spaces. The set of all (R-module) homomorphisms MN is denoted by HomR(M, N) and the set of all (R-module) endomorphisms of M is denoted by EndR M. These sets are again R-modules under the definitions: (f + g)(x) := f(x) + g(x) and (af)(x) := af(x) for all and (and f, g in HomR(M, N) or EndR M).

The kernel and image of an R-linear map f : MN are defined as the sets Ker and Im . With these notations we have the isomorphism theorem for modules:

Theorem 2.30. Isomorphism theorem

Ker f and Im f are submodules of M and N respectively and M / Ker f ≅ Im f.

For an R-module M and an ideal of R, the set consisting of all finite linear combinations with , and is a submodule of M. On the other hand, for a submodule N of M the set is an ideal of R. In particular, the ideal (M : 0) is called the annihilator of M and is denoted as AnnR M (or as Ann M). For any ideal , one can view M as an under the map . One can easily check that this map is well-defined, that is, the product is independent of the choice of the representative a of the equivalence class .

Definition 2.51.

A free module M over a ring R is defined to be a direct sum of R-modules Mi with each MiR as an R-module. If I is of finite cardinality n, then M is isomorphic to Rn.

Any vector space is a free module (Theorem 2.27 and Corollary 2.9). The Abelian groups , , are not free.

Theorem 2.31. Structure theorem for finitely generated modules

M is a finitely generated R-module if and only if M is a quotient of a free module Rn for some .

Proof

[if] The free module Rn has a canonical generating set ei, , where

ei = (0, . . . , 0, 1, 0, . . . , 0) (1 in the i-th position).

If M = Rn/N, then the equivalence classes ei + N, i = 1, ..., n, constitute a finite set of generators of M.

[only if] If x1, ..., xn generate M, then the R-linear map f : RnM defined by (a1, ..., an) ↦ a1x1 + · · · + anxn is surjective. Hence by the isomorphism theorem MRn / Ker f.

**2.7.3. Algebras

Let be a homomorphism of rings. The ring A can be given an R-module structure with the multiplication map for and . This R-module structure of A is compatible with the ring structure of A in the sense that for every a, and x, one has (ax)(by) = (ab)(xy).

Conversely, if a ring A has an R-module structure with (ax)(by) = (ab)(xy) for every a, and x, , then there is a unique ring homomorphism taking aa · 1 (where 1 denotes the identity of A). This motivates us to define the following.

Definition 2.52.

Let R be a ring. An algebra over R or an R-algebra is a ring A together with a ring homomorphism . The homomorphism is called the structure homomorphism of the R-algebra A. If A and B are R-algebras with structure homomorphisms and ψ : RB, then an R-algebra homomorphism (from A to B) is a ring homomorphism η : AB such that .

Example 2.16.

Let R be a ring.

  1. The polynomial ring R[X1, . . . , Xn] is an R-algebra with the canonical inclusion as the structure homomorphism and is called a polynomial algebra over R.

  2. For an ideal of R, the canonical surjection makes an R-algebra.

  3. If A is an R-algebra with structure homomorphism and if B is an A-algebra with structure homomorphism ψ : AB, then B is an R-algebra with structure homomorphism .

  4. Combining (2) and (3) implies that if A is an R-algebra and an ideal of A, then the ring is again an R-algebra, called the quotient algebra of A by .

An R-algebra A is an R-module with the added property that multiplication of elements of A is now legal. Exploiting this new feature leads to the following concept of algebra generators.

Definition 2.53.

Let A be an R-algebra with the structure homomorphism . A subset S of A is said to generate A as an R-algebra, if every element can be written as a polynomial expression in (finitely many) elements of S with coefficients from R (that is, from ). We write this as A = R[S]. If S = {x1, . . . , xn} is finite, we also write R[x1, . . . , xn] in place of R[S] and say that A is finitely generated as an R-algebra or that the homomorphism is of finite type.

Example 2.17.
  1. The polynomial algebra R[X1, . . . , Xn], n ≥ 1, over R is not finitely generated as an R-module, but is finitely generated as an R-algebra.

  2. For an ideal of R[X1, . . . , Xn], the ring is generated as an R-algebra by the equivalence classes of Xi, 1 ≤ in, that is, A = R[x1, . . . , xn]. If is not the zero ideal, then A is not a polynomial algebra, because x1, . . . , xn are not indeterminates in the sense that they satisfy (non-zero) polynomial equations f(x1, . . . , xn) = 0 for every . (In this case, we also say that x1, . . . , xn are algebraically dependent.) The notation R[. . .] is a generalization of the notation for polynomial algebras. In what follows, we usually denote polynomial algebras by R[X1, . . . , Xn] with upper-case algebra generators, whereas for an arbitrary finitely generated R-algebra we use lower-case symbols for the algebra generators as in R[x1, . . . , xn].

One may proceed to define kernels and images of R-algebra homomorphisms and frame and prove the isomorphism theorem for R-algebras. We leave the details to the reader. We only note that algebra homomorphisms are essentially ring homomorphisms with the added condition of commutativity with the structure homomorphisms.

Theorem 2.32.

A ring A is a finitely generated R-algebra if and only if A is a quotient of a polynomial algebra (over R).

Proof

[if] Immediate from Example 2.17.

[only if] Let A := R[x1, . . . , xn]. The map η : R[X1, . . . , Xn] → A that takes f(X1, . . . , Xn) ↦ f(x1, . . . , xn) is a surjective R-algebra homomorphism. By the isomorphism theorem, one has the isomorphism AR[X1, . . . , Xn]/Ker η of R-algebras.

This theorem suggests that for the study of finitely generated algebras it suffices to investigate only the polynomial algebras and their quotients.

Exercise Set 2.7

2.63Let V be a K-vector space, U a subspace of V, and T an arbitrary K-basis of U. Show that there is a K-basis of V, that contains T. [H]
2.64
  1. Let V be a K-vector space, and U1, U2 subspaces of V. Show that the set is a K-subspace of V. If U1U2 = {0}, we say that U is the direct sum of U1 and U2 and write U = U1U2.

  2. Let V be a K-vector space and W a subspace of V. Show that there exists a subspace W′ of V such that V = WW′. This space W′ is called the complement subspace of W in V. [H]

2.65Let V and W be K-vector spaces and f : VW a K-linear map. Show that f is uniquely determined by the images f(x), , where S is a basis of V.
2.66Let V and W be K-vector spaces. Check that HomK(V, W) is a vector space over K. Show that dimK(HomK(V, W)) = (dimK V)(dimK W). In particular, if W = K, then HomK(V, K) is isomorphic to V. The space HomK(V, K) is called the dual space of V.
2.67Let V and W be m- and n-dimensional K-vector spaces, S = {x1, . . . , xm} a K-basis of V, T = {y1, . . . , yn} a K-basis of W, and f : VW a K-linear map. For each i = 1, . . . , m, write f(xi) = ai1y1 + · · · + ainyn, . The m × n matrix is called the transformation matrix of f (with respect to the bases S and T). We have:

Let V1, V2, V3 be K-vector spaces, f, f1, , and . Prove the following assertions:

  1. .

  2. .

  3. f is invertible (as a map) if and only if is invertible (as a matrix).

(Remark: This exercise explains that the linear transformations of finite-dimensional vector spaces can be explained in terms of matrices.)

2.68Show that for every there are integers a1, . . . , an that constitute a minimal set of generators for the unit ideal in . [H]
2.69Let M be an R-module. A subset S of M is called a basis of M, if S generates M and is linearly independent over R in the sense that , , , , implies a1 = · · · = an = 0. Show that M has a basis if and only if M is a free R-module.
2.70We define the rank of a finitely generated R-module M as

RankR M := min{#S | M is generated by S}.

If N is a submodule of M, show that RankR M ≤ RankR N + RankR(M/N). Give an example where the strict inequality holds.

2.71Let M be an R-module. An element is called a torsion element of M, if Ann Rx ≠ 0, that is, if there is with ax = 0. The set of all torsion elements of M is denoted by Tors M. M is called torsion-free if Tors M = {0}, and a torsion module if Tors M = M.
  1. Show that Tors M is a submodule of M.

  2. Show that Tors M is a torsion module (called the torsion submodule of M) and that the module M/Tors M is torsion-free.

  3. If R is an integral domain, show that every free module over R is torsion-free. In particular, every vector space is torsion-free.

2.72Show that:
  1. is not finitely generated as a -module. [H]

  2. is not a free -module. [H]

  3. is a torsion-free -module.

This shows that the converse of Exercise 2.71(c) is not true in general.

2.8. Fields

In this section, we study some important properties of field extensions. We also give an introduction to Galois theory. Unless otherwise stated, the letters F, K and L stand for fields in this section.

2.8.1. Properties of Field Extensions

We have seen that if FK is a field extension, then K is a vector space over F. This observation leads to the following very useful definitions.

Definition 2.54.

For a field extension FK, the cardinality of any F-basis of K is called the degree of the extension FK and is denoted by [K : F]. If [K : F] is finite, K is called a finite extension of F. Otherwise, K is called an infinite extension of F.

Proposition 2.30.

Let FKL be a tower of field extensions. Then [L : F] = [L : K] [K : F]. In particular, the extension FL is finite if and only if the extensions FK and KL are finite. In that case, [L : K] | [L : F] and [K : F] | [L : F].

Proof

One can easily check that if S is an F-basis of K and S′ a K-basis of L, then the set is an F-basis of L.

Recall the definitions of the rings F[X] of polynomials and F(X) of rational functions in one indeterminate X. These notations are now generalized. For a field extension FK and for , we define:

and

Equation 2.3


It is easy to see that F[a] is the smallest (with respect to inclusion) of the integral domains that contain F and a. Similarly F(a) is the smallest of the fields that contain F and a. We also have F[a] ⊆ F(a). Now we state the following important characterization of algebraic elements.

Theorem 2.33.

For a field extension FK and an element , the following conditions are equivalent:

  1. The element a is algebraic over F.

  2. The extension F(a) is finite over F.

  3. F(a) = F[a].

Proof

[(a)⇒(b)] Let be of degree d. Consider the ring homomorphism that takes f(X) ↦ f(a). From Proposition 2.28, Ker , and by the isomorphism theorem . Since h is irreducible over F, F[X]/〈h〉 and so Im are fields. Since Im contains F and a (note that ), we have , that is, . Finally, notice that [F[X]/〈h〉 : F] = d.

[(b)⇒(c)] Let d := [F(a) : F]. Since the elements 1, a, a2, . . . , ad are linearly dependent over F, there exists , not all 0, such that α0 + α1a + · · · + αdad = 0. This, in turn, implies that there is an irreducible polynomial with h(a) = 0. Now consider . Clearly, hg (because otherwise g(a) = 0). Since h is irreducible, gcd(g, h) = 1, that is, there exist polynomials u(X), with u(X)g(X) + v(X)h(X) = 1, that is, with u(a)g(a) = 1. But then .

[(c)⇒(a)] Clearly, the element 0 is algebraic over F. So assume a ≠ 0. Since , by hypothesis there is a polynomial such that 1/a = f(a). But then a is a root of the non-constant polynomial .

Corollary 2.10.

For a field extension FK, the set of elements in K that are algebraic over F is a field.

Proof

It is sufficient to show that if a, are algebraic over F, then the elements a ± b, ab and a/b (if b ≠ 0) are also algebraic over F. By Theorem 2.33, [F(a) : F] is finite. Since b is algebraic over F, it is also algebraic over F(a). In particular, [F(a)(b) : F(a)] is finite. But then the extension F(a)(b) is also finite over F and contains a ± b, ab and a/b (if b ≠ 0).

The field F(a)(b) in the proof of the last corollary is also denoted as F(a, b). It is the smallest subfield of K that contains F, a and b, and it follows that F(a, b) = F(b, a). More generally, for a field extension FK and for , each algebraic over F, the field F(a1, . . . , an) is defined as F(a1)(a2) . . . (an) and is independent of the order in which ai are adjoined.

Corollary 2.11.

Let FK be a finite extension. Then K is algebraic over F.

Proof

For any , we have FF(a) ⊆ K. Now use Proposition 2.30.

The converse of the last corollary is not true, that is, it is possible that an algebraic extension has infinite extension degree. Exercise 2.59 gives an example.

Corollary 2.12.

If FK and KL are algebraic field extensions, then FL is also algebraic.

Proof

Take an arbitrary . Since KL is algebraic, there is a non-zero polynomial such that f(a) = 0. It then follows that a is algebraic over F0, . . . , αn). Since each αi is algebraic over F, the degree [F0, . . . , αn) : F] is finite. Therefore, [F0, . . . , αn)(a) : F] = [F0, . . . , αn)(a) : F0, . . . , αn)] [F0, . . . , αn) : F] is also finite and hence F0, . . . , αn)(a) and, in particular, a are algebraic over F.

Definition 2.55.

A field extension FK is called simple, if K = F(a) for some .

Proposition 2.31.

Let F be a field of characteristic 0 and let a, b (belonging to some extension of F) be algebraic over F. Then the extension F(a, b) of F is simple.

Proof

Let p(X) and q(X) be the minimal polynomials (over F) of a and b respectively. Let d := deg p and d′ := deg q. The polynomials p and q are irreducible over F and hence by Exercise 2.61 have no multiple roots. Let a1, . . . , ad be the roots of p and b1, . . . , bd the roots of q with a = a1 and b = b1. For each i, j with j ≠ 1, the equation ai + λbj = a + λb has a unique solution for λ (not necessarily in F). Since F is infinite, we can choose which is not a solution of any of the equations just mentioned. Define c := a + μb, so that cai + μbj for all i, j with j ≠ 1. Clearly, F(c) ⊆ F(a, b). To prove the reverse inclusion, note that by hypothesis q(b) = 0. Also if we define , we see that f(b) = p(a) = 0. By the choice of c, we have f(bj) ≠ 0 for j ≠ 1. Finally since q is square-free, we have . This implies that and so too.

Corollary 2.13.

A finite extension FK of fields of characteristic 0 is simple.

Proof

We proceed by induction on d := [K : F]. The result vacuously holds for d = 1. So let us assume that d > 1 and that the result holds for all smaller values of d. Choose an element . Then [F(a) : F] > 1 and divides d. If [F(a) : F] = d, we are done. So assume [F(a) : F] < d. Since [K : F(a)] < d, by the induction hypothesis the extension F(a) ⊆ K is simple, say K = F(a)(b) = F(a, b). The result now follows immediately from the previous proposition.

2.8.2. Splitting Fields and Algebraic Closure

Let f(X) be a non-constant polynomial of degree d in F[X]. Assume that f does nor split over F. Consider an irreducible (in F[X]) factor f′ of f of degree d′ > 1. F′ := F[X]/〈f′〉 is a field extension of F. Furthermore, if , the elements 1, constitute a basis of F′ over F. In particular, [F′ : F] = d′ ≤ d. Now, one can write f(X) = (X – α1)g(X) for some . If g splits over F′, so does f too. Otherwise, choose any irreducible (in F′[X]) factor g′ of g with deg g′ > 1 and consider the field extension F″ := F′[X]/〈g′〉. Then [F″ : F′] = deg g′ ≤ deg g = d – 1, so that [F″ : F] ≤ d(d – 1). Moreover, if , then f(X) = (X –α1)(X –α2)h(X) for some . Proceeding in this way we get:

Proposition 2.32.

For a polynomial of degree d ≥ 1, there is a field extension K of F with [K : F] ≤ d!, such that f splits over K.

We now establish the uniqueness of the splitting field of a polynomial . To start with, we set up certain notations. An isomorphism μ : FF′ of fields induces an isomorphism μ* : F[X] → F′[Y] of polynomial rings, defined by adXd+ad–1Xd–1 + · · · + a0 ↦ μ(ad)Yd + μ(ad–1)Yd–1 + · · · + μ(a0). We have μ*(a) = μ(a) for all . Note also that is irreducible over F if and only if is irreducible over F′. With these notations we state the following important lemma.

Lemma 2.5.

Let the non-constant polynomial be irreducible over F. Let α and β be roots of f and μ*(f) respectively. Then there is an isomorphism τ : F (α) → F′(β) of fields such that τ(a) = μ(a) for all and τ(α) = β.

Proof

Since F(α) = F[α] and F′(β) = F′[β], we can define the map τ : F[α] → F′[β] by g(α) ↦ (μ*(g))(β) for each . It is now an easy check that τ is a well-defined isomorphism of fields with the desired properties.

Roots of an irreducible polynomial are called conjugates (of each other). If α and β are two roots of an irreducible polynomial , the last lemma guarantees the existence of an isomorphism τ : F(α) → F(β) that fixes all the elements of F and that maps αβ.

Proposition 2.33.

We use the maps μ : FF′ and μ* : F[X] → F′[Y] as defined above. Let be a non-constant polynomial and let K and K′ be splitting fields of f and μ*(f) over F and F′ respectively. Then there is an isomorphism τ : KK′ of fields, such that τ(a) = μ(a) for all .

Proof

We proceed by induction on n := [K : F]. (By Proposition 2.32 n is finite.) If n = 1, then K = F, that is, the polynomial f splits over F itself and so does μ*(f) over F′, that is, K′ = F′. Thus τ = μ is the desired isomorphism.

Now assume that n > 1 and that the result holds for all fields L and for all polynomials in L[X] with splitting fields (over L) of extension degrees less than n. Consider an irreducible factor g of f with 1 < deg g ≤ deg f. Note that g also splits over K. We take any root of g and consider the tower of field extensions FF(α) ⊆ K. Similarly, let be a root of μ*(g) and consider F′ ⊆ F′(β) ⊆ K′. By Lemma 2.5 there is an isomorphism ν : F(α) → F′(β) with ν(a) = μ(a) for all and ν(α) = β. Now [K : F(α)] = [K : F]/[F (α) : F] = [K : F ]/deg g < n. It is evident that K and K′ are splitting fields of f and μ*(f) over F(α) and F′(β) respectively. Hence by the induction hypothesis there is an isomorphism τ : KK′ with τ(a) = ν(a) for all . In particular, τ(a) = μ(a) for all .

The results pertaining to the splitting field of a polynomial can be generalized in the following way. Let S be a non-empty subset of F[X]. A splitting field of S over F is a minimal field K containing F such that each polynomial splits in K. If S = {f1, . . . , fr} is a finite set, the splitting field of S is the same as the splitting field of f = f1 · · · fr (Exercise 2.57). But the situation is different, if S is infinite. Of particular interest is the set S consisting of all irreducible polynomials in F[X]. In this case, the splitting field of S is an algebraic closure of F.

We give a sketch of the proof that even when S is infinite, a splitting field for S can be constructed. This, in particular, establishes the existence of an algebraic closure of any field. We may assume that S comprises non-constant polynomials only. For each , we define an indeterminate Xf and consider the ring and the ideal of A generated by f(Xf) for all . We have and, therefore, there is a maximal ideal of A containing (Exercise 2.23). Consider the field F1 := A/m containing F. Every polynomial contains at least one root in F1. Now we replace F by F1 and as above get another field F2 containing F1 (and hence F), such that every polynomial in S (of degree ≥ 2) has at least two roots in F2. We continue this procedure (infinitely often, if necessary) and obtain a sequence of fields FF1F2F3 ⊆ · · ·. Define K to be the field consisting of all elements of , that are algebraic over F. Each polynomial in S splits in K, but in no proper subfield of K, that is, K is a splitting field of S.

It turns out that the splitting field of S is unique up to isomorphisms that fix elements of F. In particular, the algebraic closure of F is unique up to isomorphisms that fix elements of F, and is denoted by .

*2.8.3. Elements of Galois Theory

For a field K, the set Aut K of all automorphisms of K is a group under (functional) composition. We extend this concept now. Let FK be an extension of fields.

Definition 2.56.

An automorphism is called an F-automorphism of K, if fixes all the elements of F(which means that for all ). The set of all F-automorphisms of K is denoted by AutF K or by Gal(K|F) and is a subgroup of Aut K. The Galois group of a polynomial is defined to be the group AutF K, where K is the splitting field of f over F.

Conversely, for a subgroup H of AutF K the set of elements of K that are fixed by all the automorphisms of H, that is, the set of all with for every , is a subfield of K, called the fixed field of H (over F) and denoted as FixF H. Clearly, F ⊆ FixF HK.

For every intermediate field L (that is, a field L with FLK), we have a subgroup AutL K of AutF K. Conversely, given a subgroup H of AutF K we have the intermediate fixed field FixF H. It is a relevant question to ask if there is any relationship between the subgroups of AutF K and the intermediate fields. A nice correspondence exists for a particular type of extensions that we define now.

Definition 2.57.

A field extension FK is said to be a Galois extension (or K is said to be a Galois extension over F), if FixF (AutF K) = F. Thus K is Galois over F if and only if for every there is a with .

Example 2.18.

Let K be the splitting field of a non-constant polynomial . By Exercise 2.77, the extension FK is normal. Assume that FK is a separable extension (Exercise 2.75). Consider an element and let g be the minimal polynomial of α over F. Then deg g > 1 and g splits in K[X]. By assumption (of separability), there is a root of g with βα. Lemma 2.5 shows that there is a such that τ(α) = β. Thus, K is Galois over F. In particular, if char F = 0 or if , then FK is separable and so Galois. For example, is a Galois extension of .

The following theorem establishes the correspondence we are looking for.

Theorem 2.34. Fundamental theorem of Galois theory

For a finite Galois extension FK, there is a bijective correspondence between the set of all intermediate fields and the set of all subgroups of AutF K (given by L ↦ AutL K and H ↦ FixF H) such that the following assertions hold:

  1. AutFixF H K = H for every subgroup H of AutF K.

  2. FixF (AutL K) = L for every field L with FLK.

  3. For field extensions FLL′ ⊆ K, the extension degree [L′ : L] is the same as the index [AutL K : AutL K]. In particular, the order of AutF K is [K : F].

  4. For every intermediate field L, one has:

  1. K is Galois over L.

  2. L is Galois over F if and only if AutL K is a normal subgroup of AutF K. In this case, AutF L ≅ AutF K/AutL K.

A proof of this theorem is rather long and uses many auxiliary results which we would not need otherwise. We, therefore, choose to omit the proof here.

Exercise Set 2.8

2.73Let α be transcendental over F. Show that the domain F[α] and the field F(α) are respectively isomorphic to the polynomial ring F[X] and the field F(X) of rational functions in one indeterminate X. Generalize the result for an arbitrary family αi, , of elements each of which is transcendental over F.
2.74Let FK be a field extension and let be an endomorphism of K with for every .
  1. If a non-constant polynomial has a root , show that is also a root of f. For example, if , and is the automorphism mapping z to its (complex) conjugate , then we conclude that if a complex number z is a root of , then is also a root of f. A similar result holds for the extension , where m is a non-square rational number.

  2. If K is algebraic over F, show that is an automorphism. [H]

2.75Let FK be a field extension.
  1. An irreducible polynomial is said to be separable over F, if f has no multiple roots. An algebraic element is said to be separable over F, if the minimal polynomial of α over F is separable. K is called a separable extension of F, if every element of K is (algebraic and) separable over F. Show that if char F = 0 or if , and if K is an algebraic extension of F, then K is separable over F · [H]

  2. An algebraic element is called purely inseparable over F, if the minimal polynomial of α over F factors in K[X] as (Xα)n for some . If every element of K is (algebraic and) purely inseparable over F, then K is called a purely inseparable extension of F. Show that is both separable and purely inseparable if and only if . Thus, if char F = 0 or , then F has no purely inseparable extension other than itself.

  3. If p := char F > 0, then an element is purely inseparable over K if and only if minpolyα,F(X) = Xpr + a for some r ≥ 0 and . In particular, show that if K is a finite purely inseparable extension of F, then [K : F ] = ps for some s ≥ 0.

2.76F is called a perfect field, if every irreducible polynomial in F[X] is separable over F.
  1. Show that F is a perfect field if and only if every algebraic extension of F is separable over F. In particular, the fields of characteristic 0 and the fields , , are perfect.

  2. Let p := char F > 0. Show that F is perfect if and only if every element of F has a p-th root in F. [H]

2.77A field extension FK is called normal, if every irreducible polynomial in F[X], that has a root in K, splits in K[X].
  1. If K is the splitting field of a polynomial over F, show that K is a normal extension of F. [H]

  2. If [K : F] = 2, show that FK is a normal extension.

  3. Consider the tower of field extensions to conclude that if FK and KL are normal extensions, then FL need not be normal.

2.78Prove the following assertions:
  1. is an infinite extension of . [H]

  2. . [H]

2.79Let FK be a field extension and let L be the fixed field of AutF K over F. Show that K is a Galois extension of L.

2.9. Finite Fields

Finite fields are seemingly the most important types of fields used in cryptography. They enjoy certain nice properties that infinite fields (in particular, the well-known fields like , and ) do not. We concentrate on some properties of finite fields in this section. As we see later, arithmetic over a finite field K is fast, when char K = 2 or when #K is a prime. As a result, these two classes of fields are the most common ones employed in cryptography. However, in this section, we do not restrict ourselves to these specific fields only, but provide a general treatment valid for all finite fields. As in the previous section, we continue to use the letters F, K, L to denote fields. In addition, we use the letter p to denote a prime number and q a power of p: that is, q = pn for some .

2.9.1. Existence and Uniqueness of Finite Fields

Let K be a finite field of cardinality q. Then p := char K > 0. By Proposition 2.7, p is a prime, that is, K contains an isomorphic copy of the field . If , we have q = pn. Therefore, we have proved the first statement of the following important result.

Theorem 2.35.

The cardinality of a finite field is a power pn, , of a prime number p. Conversely, given and , there exists a finite field of cardinality pn.

Proof

In order to construct a finite field of cardinality q := pn, we start with and consider the splitting field K of the polynomial . Since f′(X) = –1 ≠ 0, the roots of f are distinct (Exercise 2.61). Therefore, the set has cardinality q. By Exercise 2.80, E is a field. Since FEK and f splits over E, by definition of splitting fields, we have K = E, that is, #K = #E = q.

Theorem 2.36. Fermat’s little theorem for finite fields

Let K be a finite field of cardinality q. Then every satisfies aq = a.

Proof

Clearly, 0q = 0. Take a ≠ 0. K* being a group of order q – 1, by Proposition 2.4 ordK* (a) divides q – 1. In particular, aq–1 = 1, that is, aq = a.

Theorem 2.37.

Let K be a finite field of cardinality q = pn and let F be the subfield of K isomorphic to . Then K is the splitting field of the polynomial over F. In particular, K is unique up to F -isomorphisms (that is, isomorphisms fixing elements of F).

Proof

By Theorem 2.37, each of the q elements of K is a root of f and consequently K is the splitting field of f. The last assertion in the theorem follows from the uniqueness of splitting fields (Proposition 2.33).

This uniqueness allows us to talk about the finite field of cardinality q (rather than a finite field of cardinality q). We denote this (unique) field by .

The results proved so far can be generalized for arbitrary extensions , where q = pn, n, . We leave the details to the reader (Exercise 2.82). It is important to point out here that since is the splitting field of XqmX over , by Exercise 2.77 we have:

Corollary 2.14.

Every finite extension of finite fields is normal.

This implies that an irreducible polynomial has either none or all of its roots in . Also if with q = pn, then αq = αpn = α. Therefore, αpn–1 is a p-th root of α. By Exercise 2.76(b), we then conclude:

Corollary 2.15.

Every finite field is perfect.

Proposition 2.34.

Consider the extension , . There is a unique intermediate field with qd elements, , if and only if d|m. Furthermore, if d|m, then belongs to the (unique intermediate) field if and only if αqd = α.

Proof

For d|m, we have (XqdX)|(XqmX). The qd roots of XqdX in K constitute an intermediate field L. If L′ ≠ L is another intermediate field with qd elements, by Theorem 2.36 there are more than qd elements of K, that are roots of XqdX, a contradiction. Conversely, an intermediate field L contains qd elements, where . Since , we have d|m. The last assertion in the proposition follows immediately from the above argument.

Corollary 2.16.

Let and . Then deg f divides m.

Proof

Consider the extension of , where d := deg f, and the fact that is a normal extension.

Now we will prove a very important result concerning the multiplicative group .

Theorem 2.38.

is a cyclic group for any finite field .

Proof

Modify the proof of Proposition 2.19 or use the following more general result.

Theorem 2.39.

Let K be a field (not necessarily finite). Then any finite subgroup G of the multiplicative group K* is cyclic.

Proof

Since K is a field, for any the polynomial Xn – 1 has at most n roots in K and hence in G. The theorem then follows immediately from Exercise 2.18.

Corollary 2.17.

Every finite extension is simple. In particular, contains an irreducible polynomial of degree m (for any q and m).

Proof

Let α be a generator of the cyclic group . Then, m is the smallest of the positive integers s for which αqs = α. Let with d := deg f, so that . If d < m, then αqd = α, a contradiction. Thus d = m, that is, .

2.9.2. Polynomials over Finite Fields

In this section, we study some useful properties of polynomials over finite fields. We concentrate on polynomials in for an arbitrary q = pn, , . We have seen how the polynomials XqmX proved to be important for understanding the structures of finite fields. But that is not all; these polynomials indeed have further roles to play. This prompts us to reserve the following special symbol: .

Let be a finite extension of finite fields and let be a root of the polynomial . Since each , we have . Therefore, . More generally, for each r = 0, 1, 2, · · · the element is a root of f(X). This gives us a nice procedure for computing the minimal polynomial of α as the following corollary suggests.

Corollary 2.18.

The minimal polynomial of over is (Xα)(Xαq) · · · (Xαqd–1), where d is the smallest of the integers for which αqs = α.

Proof

Let have degree δ. So is the smallest field containing ( and) α and hence all the roots of fα, that is, αqs = α for s = δ and for no smaller positive integer values of s. Therefore, δ = d and all the conjugates of α are precisely α, αq, . . . , αqd–1.

We now prove a theorem which has important consequences.

Theorem 2.40.

is the product of all monic irreducible polynomials in , whose degrees divide m.

Proof

We have . By Corollary 2.18, the minimal polynomial fα(X) of over divides . By Corollary 2.16, deg fα divides m. Finally, since fα(X) = fβ(X) or gcd(fα(X), fβ(X)) = 1 depending on whether α and β are conjugates or not, is a product of monic irreducible polynomials of , whose degrees divide m. In order to show that is the product of all such polynomials, let us consider an arbitrary polynomial which is monic and irreducible over and has degree d|m. The polynomial g splits over (with no multiple roots, finite fields being perfect). Since d|m, by Proposition 2.34 . Thus g splits over as well and, in particular, divides .

The first consequence of Theorem 2.40 is that it leads to a procedure for checking the irreducibility of a polynomial . Let d := deg f. If f(X) is reducible, it admits an irreducible factor of degree ≤ ⌊d/2⌋. Since is the product of all distinct irreducible factors of f with degrees dividing m, we compute the gcds g1, . . . , gd/2⌋. If all these gcds are 1, we conclude that f is irreducible. Otherwise f is reducible. We will see an optimized implementation of this procedure in Chapter 3. Besides irreducibility testing, the above theorem also leads to algorithms for finding random irreducible polynomials and for factorizing polynomials, as we will also discuss in Chapter 3.

The second consequence of Theorem 2.40 is that it gives us a formula for calculating the number of monic irreducible polynomials of a given degree over a given field. First we need to define a function on .

Definition 2.58.

The Möbius function is defined as

It follows that μ(n) ≠ 0 if and only if n is square-free.

Lemma 2.6.

For , we have

where denotes summation over all positive divisors d of n.

Proof

The result follows immediately for n = 1. For n > 1, write , where p1, . . . , pr are r ≥ 1 distinct primes and . The only non-zero terms in the sum are those corresponding to d = pi1 · · · pis for pairwise distinct choices of . From definition, it then follows that .

Lemma 2.7. Möbius inversion formula

Let f and g be maps from to an Abelian group G.

  1. If G is additive and , then

  2. If G is multiplicative and , then

Proof

To prove the additive formula we note that

where the last equality follows from Lemma 2.6. The multiplicative formula can be proved similarly.

Let us denote by νq,m the number of monic irreducible polynomials in of degree m and by the product of all monic irreducible polynomials in of degree m. By Theorem 2.40, we have and . Applications of the Möbius inversion formula then yield the following formulas:

Equation 2.4


If p1, . . . , pr are all the prime divisors of m (not necessarily all distinct), Equation (2.4) together with the observation that μ(n) ≥ –1 for all imply that But each pi ≥ 2, so that m ≥ 2r, and hence . We, therefore, have an independent proof of the second statement in Corollary 2.17. Moreover, for practical values of q and m we have the good approximation:

Equation 2.5


Since the total number of monic polynomials of degree m in Fq[X] is qm, a randomly chosen monic polynomial in of degree m has an approximate probability of 1/m for being irreducible, that is, one expects to get an irreducible polynomial of degree m, after O(m) random monic polynomials are picked up from . These observations have an important bearing for devising efficient algorithms for finding irreducible polynomials over finite fields. (See Chapter 3.)

The conjugates of over are αqi, . It is interesting to look at the sum and the product of the conjugates of α. By Corollary 2.18, for some . Since , the elements and belong to . Since αqd = α, for any (positive) integral multiple δ of d, the sum and the product are elements of too.

Definition 2.59.

Let , q = pn, be a finite extension of finite fields and let . The trace of α over is defined as the sum

and the norm of α over is defined as

In view of the preceding discussion, the trace and norm of α are elements of . For q = p, the trace and norm of α are also called the absolute trace and the absolute norm of α and are often denoted as and . We often drop the suffix in these notations, when no ambiguities are likely.

The trace and norm functions play an important role in the theory of finite fields. See Exercise 2.86 for some elementary properties of these functions.

2.9.3. Representation of Finite Fields

is a vector space of dimension m over . Let β0, . . . , βm–1 be an -basis of . Each element has a unique representation a = a0β0 + · · · + am–1βm–1 with each . Therefore, if we have a representation of the elements of , we can also represent the elements of . Thus elements of any finite field can be represented, if we have representations of elements of prime fields. But the set {0, 1, . . . , p – 1} under the modulo p arithmetic represents .

So our problem reduces to selecting suitable bases β0, . . . , βm–1 of over . In order to illustrate how we can do that, let us choose a priori a fixed monic irreducible polynomial with deg f = m. We then represent , where α (the residue class of X) is a root of f in . The elements are linearly independent over , since otherwise we would have a non-zero polynomial of degree less than m, of which α is a root. The -basis 1, α, . . . , αm–1 of is called a polynomial basis (with respect to the defining polynomial f). The elements of are then polynomials of degrees < m. The arithmetic in is carried out as the polynomial arithmetic of modulo the irreducible polynomial f.

Example 2.19.
  1. The elements of are 0 and 1 with 0 + 0 = 0, 0 + 1 = 1, 1 + 0 = 1, 1 + 1 = 0, 0 · 0 = 1 · 0 = 0 · 1 = 0 and 1 · 1 = 1. In order to represent , we choose the irreducible polynomial . Elements of are a2α2 +a1α+a0, where . In order to demonstrate the arithmetic in , we take . Their sum in is a+b = α+1. On the other hand, ab = α4+α3+α2+α = α(α3+α2+1)+α2 = α.0+α2 = α2. The complete multiplication table for this representation is given in the Table 2.2.

    Table 2.2. Multiplication table for
     01αα + 1α2α2 + 1α2 + αα2 + α + 1
    000000000
    101αα + 1α2α2 + 1α2 + αα2 + α + 1
    α0αα2α2 + αα2 + 1α2 + α + 11α + 1
    α + 10α + 1α2 + αα2 + 11αα2 + α + 1α2
    α20α2α2 + 11α2 + α + 1α + 1αα2 + α
    α2 + 10α2 + 1α2 + α + 1αα + 1α2 + αα21
    α2 + α0α2 + α1α2 + α + 1αα2α + 1α2 + 1
    α2 + α + 10α2 + α + 1α + 1α2α2 + α1α2 + 1α

  2. is represented by the set {0, 1, 2} with arithmetic operations modulo 3. Since –1 is a quadratic non-residue modulo 3, the polynomial X2 + 1 is irreducible over . Therefore, the quotient field can be used to represent , being a root of this polynomial. The multiplication table of under this representation is then as shown in Table 2.3.

    Table 2.3. Multiplication table for
     012ββ + 1β + 22β2β + 12β + 2
    0000000000
    1012ββ + 1β + 22β2β + 12β + 2
    20212β2β + 22β + 1ββ + 2β + 1
    β0β2β2β + 22β + 21β + 12β + 1
    β + 10β + 12β + 2β + 22β12β + 12β
    β + 20β + 22β + 12β + 21ββ + 12β2
    2β02ββ12β + 1β + 122β + 2β + 2
    2β + 102β + 1β + 2β + 122β2β + 2β1
    2β + 202β + 2β + 12β + 1β2β + 212β

Polynomial bases are most common in finite field implementations. Some other types of bases also deserve specific mention in this context.

Definition 2.60.

An element is called a normal element over , if the conjugates α, αq, . . . , αqm–1 are (distinct and) linearly independent over . For a normal element α of over , the -basis α, αq, . . . , αqm–1 is called a normal basis (of over ). If, in addition, α is a primitive element (that is, a generator) of , then α and the corresponding normal basis are called a primitive normal element and a primitive normal basis respectively.

It can be shown that normal bases exist for all finite extensions . It can even be shown that primitive normal bases also do exist for all such extensions.

Example 2.20.

Consider the representation of in Example 2.19. The elements α, α2 and α4 = α2 + α + 1 satisfy

with the 3×3 transformation matrix having determinant 1 modulo 2. Thus α is a normal element of and (α, α2, α4) is a normal basis of . Since is prime, α is a generator of , that is, α is also a primitive normal element of .

On the other hand, α + 1 is not a normal element of . Table 2.2 gives

with the transformation matrix having determinant zero modulo 2.

Computations over finite fields often call for exponentiations of elements a = a0β0 + · · · + am–1βm–1. If βi = αqi, i = 0, . . . , m – 1, construct a normal basis, , since αqm = α and for each i. Thus the coefficients of aq (in the representation under the given normal basis) is obtained simply by cyclically shifting the coefficients a0, . . . , am–1 in the representation of a. This leads to a considerable saving of time. In particular, this trick becomes most meaningful for q = 2 (a case of high importance in cryptography).

Now that exponentiations become cheaper with normal bases, one should not let the common operations (addition and multiplication) turn significantly slower. The sum of a = a0β0 + · · · + am–1βm–1 and b = b0β0 + · · · + bm–1βm–1 continues to remain as easy as in the case of a polynomial basis, namely, a + b = (a0 + b0)β0 + · · · + (am–1 + bm–1)βm–1, where each ai + bi is calculated in . However, computing the product ab introduces difficulty. In particular, it requires the representation of βiβj, 0 ≤ i, jm – 1, in the basis β0, . . . , βm–1, say, . For ij, we have . It is thus sufficient to look only at the coefficients , 0 ≤ j, km – 1. We denote by Cα the number of non-zero . From practical considerations (for example, for hardware implementations), Cα should be as small as possible. For q = 2, one can show that 2m – 1 ≤ Cαm2. If, for this special case, Cα = 2m – 1, the normal basis α, αq, . . . , αqm–1 is called an optimal normal basis. Unlike normal (or primitive normal) bases, optimal normal bases do not exist for all values of .

We finally mention another representation of elements of a finite field , that does not depend on the vector space representation discussed so far, but which is based on the fact that the group is cyclic. If we are given a primitive element (that is, a generator) γ of , then the elements of are 0, 1 = γ0, γ, . . . , γq–2. Multiplication and exponentiation become easy with this representation, since 0 · a = 0 for all , whereas γi · γj = γk with ki + j (mod q – 1). Unfortunately, this representation provides no clue on how to compute γi + γj. One possibility is to store a table consisting of the values zk satisfying 1 + γk = γzk for all k = 0, . . . , q – 2 (with γk ≠ –1), so that for ij one can compute γi + γj = γi(1 + γji) = γiγzji = γl, where li + zji (mod q – 1). Such a table is called Zech’s logarithm table, can be maintained for small values of q and may facilitate computations in extensions . But if q is large (or more correctly if p is large, where q = pn), this representation of elements of is not practical nor often feasible. Another difficulty of this representation is that it calls for a primitive element γ. If q is large and the integer factorization of q – 1 is not provided, there are no efficient methods known for finding such an element or even for checking if a given element is primitive.

Example 2.21.

Consider the representation of in Example 2.19. By Table 2.3, γ := β + 1 is a generator of . Table 2.4 lists the powers of γ and the Zech logarithms.

Table 2.4. Zech’s logarithm table for with respect to γ = β + 1
kγk1 + γkzk
0124
1β + 1β + 27
22β2β + 13
32β + 12β + 25
420
52β + 22β2
6ββ + 11
7β + 2β6

Exercise Set 2.9

2.80Let F be a field (not necessarily finite) of characteristic and let a, . Prove that (a + b)p = ap + bp, or, more generally, (a + b)pn = apn + bpn for all . [H]
2.81Let , and q := pn. Prove that:
  1. If , then f(Xp) = f(X)p.

  2. If , then f(Xp) = g(X)p for some .

2.82Let , n, and q := pn. Let FK be an extension of finite fields with #F = q and #K = qm. Show that K is the splitting field of over . [H]
2.83Write the addition and multiplication tables of (some representations of) the fields and . Use these tables to find a primitive element in each of these fields and a normal element in (over ).
2.84Let K be a field (not necessarily finite or of positive characteristic).
  1. Let be of degree 2 or 3. Prove that f is reducible in K[X] if and only if f has a root in K. Deduce that X2 + X + 1 and X3 + X + 1 are irreducible in .

  2. Let be of degree d ≥ 0. The opposite of f is the polynomial . Show that f(X) is irreducible in K[X] if and only if fop(X) is irreducible in K[X]. Deduce that X3 + X2 + 1 is irreducible in .

2.85In this exercise, one studies the arithmetic in the finite field .
  1. Show that is irreducible.

  2. Let us represent as . Call and consider the elements a := 3α2 + 2α + 1 and b := 2α2 + 3 in . Compute ab–1 in this representation of . You should compute the canonical representative of ab–1 in , that is, a polynomial in α of degree < 3 with coefficients reduced modulo 5.

2.86Let FKL be finite extensions of finite fields with [L : K] = s. Let α, and . Prove the following assertions:
  1. TrK|F(α + β) = TrK|F(α) + TrK|F (β) and NK|F (αβ) = NK|F (α) NK|F (β).

  2. TrL|F (α) = s TrK|F (α) and NL|F (α) = NK|F (α)s.

  3. Transitivity of trace and norm

    TrL|F (γ) = TrK|F (TrL|K(γ)) and NL|F (γ) = NK|F (NL|K (γ)).

2.87Let be a finite extension of finite fields. In this exercise, we treat both K and L as vector spaces over K. Show that:
  1. TrL|K is a surjective linear transformation LK.

  2. All the linear transformations LK are given by Tα : LK, β ↦ TrL|K(αβ), where . (In this notation, TrL|K = T1.) Moreover, for distinct elements α, the linear transformations Tα and Tα are distinct.

2.88Let K and L be as in Exercise 2.87 and let . Show that TrL|K(β) = 0 if and only if β = γq – γ for some .
2.89Let K and L be as in Exercise 2.87. Two K-bases (β0, . . . , βm–1) and (γ0, . . . , γm–1) of L are called dual or complementary, if TrL|K(βiγj) = δij.[10] Show that every K-basis of L has a unique dual basis.

[10] The Kronecker delta δ on an index set I (finite or infinite) is defined for i, as:

2.90Prove that every finite extension of finite fields is Galois. [H]
2.91For the extension , consider the map , ααq.
  1. Show that is an -automorphism of . is called the Frobenius automorphism of over .

  2. Show that is cyclic of order m and with as a generator. [H]

2.92Let be irreducible with deg f = d. Consider the extension and let r := gcd(d, m).
  1. Show that f is irreducible in if and only if r = 1. [H]

  2. More generally, show that f factors in into a product of r irreducible polynomials each of degree d/r.

2.93Consider the representation of in Example 2.19. Construct the minimal polynomials over of the elements of . [H]
2.94Show that the number of (ordered) -bases of is

(qm – 1)(qmq)(qmq2) · · ·(qmqm – 1).

*2.10. Affine and Projective Curves

In this section, we introduce some elementary concepts from algebraic geometry, which facilitate the treatment of elliptic and hyperelliptic curves in the next two sections. We concentrate only on plane curves, because these are the only curves we need in this book. Throughout this section, K denotes a field (finite or infinite) and the algebraic closure of K.

2.10.1. Plane Curves

The solutions of a polynomial equation f(X, Y) = 0 is one of the central objects of study in algebraic geometry. For example, we know that in the equation X2 + Y2 – 1 = 0 represents a circle with origin at (0, 0) and with radius 1. When we pass to an arbitrary field, it is often not possible to visualize such plots, but it still makes sense to talk about the set of solutions of such an equation. For example, the solutions of the above circle in are the four discrete points (0, 1), (0, 2), (1, 0) and (2, 0). (This solution set does not really look round.)

One can generalize this study by considering polynomials in n indeterminates and by investigating the simultaneous solutions of m polynomials. We, however, do not intend to be so general here and concentrate only on curves defined by a single polynomial equation in two indeterminates.

Definition 2.61.

For , the n-dimensional affine space over K is defined to be the set consisting of all n-tuples (x1, . . . , xn) with each . For n = 2, the affine space is also called the affine plane over K. For a point , the elements are called the affine coordinates of P. The affine space over the closure is often abbreviated as , when the field K is understood from the context.

is an n-dimensional vector space over K. For example, the affine plane can be identified with the conventional X-Y plane.

Definition 2.62.

An affine plane (algebraic) curve C over K is defined by a polynomial and is written as C : f(X, Y) = 0. The set C(K) of K-rational points on an affine plane curve C : f(X, Y) = 0 is the set of all points satisfying f(x, y) = 0.

K-rational points on a plane curve are precisely the solutions of the defining polynomial equation. Standard examples of affine plane curves include the straight lines given by aX + bY + c = 0, a, , not both 0, and the conic sections (circles, ellipses, parabolas and hyperbolas) given by aX2 + bXY + cY2 + dX + eY + f = 0, a, b, c, d, e, with at least one of a, b, c non-zero. For , the set of K-rational points can be drawn as a graph of the polynomial equation, whereas for an arbitrary field K (in particular, for finite fields) such drawings make little or no sense. However, it is often helpful to visualize curves as curves over (also called real curves) and then generalize the situation to an arbitrary field K.

The number ∞ is not treated as a real number (or integer or natural number). But it is often helpful to extend the definition of by including two points that are infinitely far away from the origin, one in each direction. This gives us the so-called extended real line . An immediate advantage of such a completion of is that every Cauchy sequence converges in . But for studying the roots of polynomial equations it is helpful to add only a single point at infinity to in order to get what is called the projective line over . Similarly, if we start with the affine plane and add a point at infinity for each slope of straight lines Y = aX + b and one more for the lines X = c, we get the so-called projective plane over . We also call the line passing through all the points at infinity in to be the line at infinity. An immediate benefit of passing from to is that in any two distinct lines (parallel or not in ) meet at exactly one point and through any two distinct points of passes a unique line.

Now it is time to replace by an arbitrary field K and rephrase our definitions in such a way that it continues to make sense to talk about points and line at infinity, even when K itself contains only finitely many points.

Definition 2.63.

Let . Define the relation ~ on the ‘punctured’ n + 1-dimensional affine space over K by (x0, . . . , xn) ~ (y0, . . . , yn) if and only if there exists a such that yi = λxi for all i = 0, . . . , n. It is easy to see that ~ is an equivalence relation on . The set of all equivalence classes of ~ is called the n-dimensional projective space over K. In particular, is called the projective plane over K. A point is the equivalence class of a point . The elements constitute a set of homogeneous coordinates for P.

It is evident that can be identified with the set of all 1-dimensional vector subspaces (that is lines) of the affine space . To argue that this formal definition tallies with the intuitive notion for n = 2 and , consider the affine 3-space referred to by the coordinates X, Y, Z. Look at the family of planes , parallel to the X-Y plane. (ε0 is the X-Y plane itself.) First take a non-zero value of λ, say λ = 1. Every line in passing through the origin and not parallel to the X-Y plane meets ε1 exactly at one point. Conversely, a unique line passes through each point on ε1 and the origin. In this way, we associate points of with points on ε1. These are all the finite points of . On the other hand, the lines passing through the origin and lying in the X-Y plane (ε0 : Z = 0) do not meet ε1 and correspond to the points at infinity of .

In the last paragraph, we obtained the canonical embedding of the affine plane in by setting Z = 1. By definition, is symmetric in X, Y and Z. This means that we can as well set X = 1 or Y = 1 and see that there are other embeddings of in . This observation often proves to be useful (for example, see Definition 2.66).

Now that we have passed from the affine plane to the projective plane, we should be able to carry (affine) plane curves to the projective plane. For this, we need some definitions.

Definition 2.64.

Let R denote the polynomial ring K[X0, X1, . . . , Xn] over a field K. A monomial of R is an element of R of the form , αi ≥ 0. A term in R is a monomial multiplied by an element . Any polynomial is a sum of finitely many nonzero terms. The degree of a monomial (or a term ) is defined as α0 + α1 + · · · + αn. The degree of a non-zero polynomial , denoted deg f, is defined to be the maximum of the degrees of its non-zero terms. The degree of the zero polynomial is taken to be –∞. A non-zero polynomial is said to be homogeneous of degree d ≥ 0, if all of its non-zero terms have degree d. The zero polynomial is said to be homogeneous of any degree.

Let C : f(X, Y) = 0 be an affine plane curve over a field K defined by a non-zero polynomial and d := deg f. Then f(h)(X, Y, Z) := Zdf(X/Z, Y/Z) is a homogeneous polynomial of degree d in the polynomial ring K[X, Y, Z]. The polynomial f(h) is called the homogenization of f. Putting Z = 1 in f(h)(X, Y, Z) gives back the original polynomial f(X, Y), that is, f(h)(X, Y, 1) = f(X, Y). Therefore, f is called the dehomogenization of the homogeneous polynomial f(h). The homogenization (and dehomogenization) of the zero polynomial is taken to be the zero polynomial.

Take and . By definition, [x, y, z] = [λx, λy, λz]. Since f(h)x, λy, λz) = λdf(h)(x, y, z) = 0 if and only if f(h)(x, y, z) = 0, it makes sense to talk about the zeros of the homogeneous polynomial f(h) in the projective plane . This motivates us to define projective plane curves:

Definition 2.65.

A projective plane curve C over K is defined by a homogeneous polynomial and is written as C : h(X, Y, Z) = 0. The set C(K) of K -rational points on a projective plane curve C : h(X, Y, Z) = 0 is the set of all points such that h(x, y, z) = 0.

Let C : f(X, Y) = 0 be an affine plane curve. The projective plane curve defined by f(h)(X, Y, Z) is by an abuse of notation denoted also by C. The zeros of the affine curve C : f(X, Y) = 0 in are in one-to-one correspondence with the finite zeros of C : f(h)(X, Y, Z) = 0 in (that is, zeros with Z = 1). The projective curve contains some more point(s), namely those at infinity, that can be obtained by putting Z = 0 in f(h)(X, Y, Z). Passage from the affine plane to the projective plane is just that: a systematic inclusion of the points at infinity.

It is often customary to write an affine plane curve as C : f(X, Y) = g(X, Y) and a projective plane curve as C : f(h)(X, Y, Z) = g(h)(X, Y, Z) with f(h) and g(h) of the same degree. The former is the same as the curve C : fg = 0, and the latter the same as C : f(h)g(h) = 0.

A homogeneous polynomial can be viewed as the homogenization of any of the polynomials

fZ(X, Y) = f(X, Y, 1), fY (X, Z) = f(X, 1, Z) and fX(Y, Z) = f(1, Y, Z).

Consider a point P = [a, b, c] on the projective curve C : f(X, Y, Z) = 0. Since a, b and c are not all 0, P is a finite point on at least one of fX, fY and fZ.

2.10.2. Polynomial and Rational Functions on Plane Curves

Throughout the rest of Section 2.10 we make the following assumption:

Assumption 2.1.

K is an algebraically closed field, that is, .

Although many of the results we state now are valid for fields that are not algebraically closed, it is convenient to make this assumption in order to avoid unnecessary complications.

Let C : f(X, Y) = 0 be a curve defined over K. Henceforth we assume that the polynomial f(X, Y) is irreducible over K. Though we write the affine equation for the curve for notational simplicity, we usually work with the set C(K) of the K-rational points on the corresponding projective curve. We refer to the solutions of C in the affine plane as the finite points on the curve.

Definition 2.66.

Let P = [a, b, c] be a point on a curve C defined over K. We call P a smooth or regular or non-singular point of C, if P satisfies the following conditions.

  1. If P is a finite point (that is, if c ≠ 0), then P is called a smooth point on C, if the partial derivatives ∂f/∂X and ∂f/∂Y do not vanish simultaneously at (a/c, b/c).

  2. If P is a point at infinity (that is, if c = 0), then we must have a ≠ 0 or b ≠ 0. Assume a ≠ 0. (The other case can be treated similarly.) Consider the polynomial . P is a finite point on the curve D : g(Y, Z) = 0. P is called a smooth point on C, if (b/a, 0) is a smooth point on D, that is, if ∂g/∂Y and ∂g/∂Z do not vanish simultaneously at (b/a, 0).

A non-smooth point on C is also called non-regular or singular. C is called smooth or regular or non-singular, if all points (finite and infinite) on C are smooth.

Now we define polynomial functions on C. For a moment, we concentrate on the affine curve, that is, only the finite points on C. Let g, with (that is, f|(gh)). Since for any point P on C we have f(P) = 0, it follows that g(P) = h(P). This motivates us to define the following.

Definition 2.67.

The ring K[X, Y]/〈f〉 is called the affine coordinate ring of C and is denoted by K[C]. Elements of K[C] are called polynomial functions on C. If we denote by x and y the residue classes of X and Y respectively in K[C], then a polynomial function on C is given by a polynomial .[11] By our assumption, f is an irreducible polynomial; so 〈f〉 is a prime ideal of K[X, Y], that is, the coordinate ring K[C] is an integral domain.

[11] Recall from Section 2.7 that K[x, y] is the K-algebra generated by x and y. It is not a polynomial algebra (in general).

The quotient field (Exercise 2.34) of K[C] is called the function field of C and is denoted by K(C). An element of K(C) is of the form g(x, y)/h(x, y) with g(x, y), , h(x, y) ≠ 0 (that is, h(X, Y) ∉ 〈f〉), and is called a rational function on C.

By definition, two rational functions are equal if and only if g1(x, y)h2(x, y) – g2(x, y)h1(x, y) = 0 in K[C] or, equivalently, if and only if . We define addition and multiplication of rational functions by the usual rules (Exercise 2.34).

Definition 2.68.

Let P = (a, b) be a finite point on the curve C. Given a polynomial function , the value of g at P is defined to be . If is a rational function, then r is said to be defined at P, if r has a representation r = g/h, g, , with h(P) ≠ 0. In that case, we define the value of r at P to be . If r is not defined at P, it is customary to write r(P) = ∞.

By definition, K[C] and K(C) are collections of equivalence classes. However, the value of a polynomial or a rational function on C is independent of the representatives of the equivalence classes and is, therefore, a well-defined concept.

The above definitions can be extended to the corresponding projective curve C : f(h)(X, Y, Z) = 0. By Exercise 2.96(e), the polynomial f(h) is irreducible, since we assumed f to be so.

Definition 2.69.

The function field (denoted again by K(C)) of the projective curve C is the set of quotients (called rational functions) of the form g(X, Y, Z)/h(X, Y, Z), where g, are homogeneous of the same degree and h ∉ 〈f(h)〉. Two rational functions g1/h1 and g2/h2 are equal if and only if .

A rational function is said to be defined at a point P = [a, b, c] on C, if r has a representation g/h with h(a, b, c) ≠ 0. In that case, we define r(P) := g(a, b, c)/h(a, b, c). Since g and h are homogeneous and of the same degree, the value r(P) is independent of the choice of the projective coordinates of P (Exercise 2.95). If r is not defined at P, we write r(P) = ∞.

One can define polynomial functions on a projective curve (as we did for affine curves), but it makes no sense to talk about the value of such a polynomial function at a point P on the curve, because this value depends on the choice of the homogeneous coordinates of P (Exercise 2.95). This problem is eliminated for a rational function g/h by assuming g and h to be of the same degree.

Definition 2.70.

Let C be a projective plane curve, r be a non-zero rational function and P a point on C. P is called a zero of r if r(P) = 0, and a pole of r if r(P) = ∞.

Now we define the multiplicities of zeros and poles of a rational function or, more generally, the order of any point on a projective plane curve. This is based on the following result, the proof of which is long and difficult, and is omitted.

Theorem 2.41.

Let C be a projective plane curve defined by an irreducible polynomial over K and P a smooth point on C. Then there exists a rational function (depending on P) with the following properties:

  1. uP (P) = 0.

  2. For any non-zero rational function , there exist an integer d and a rational function having neither a zero nor a pole at P such that . The integer d does not depend on the choice of uP.

Definition 2.71.

The function uP of the last theorem is called a uniformizing variable or a uniformizing parameter or simply a uniformizer of C at P. For any non-zero rational function , the integer d with is called the order of r at P and is denoted by ordP (r).

The connection of poles and zeros with orders is established by the following theorem which we again avoid to prove.

Theorem 2.42.

P is neither a pole nor a zero of r if and only if ordP(r) = 0. P is a zero of r if and only if ordP(r) > 0. P is a pole of r if and only if ordP(r) < 0.

If P is a zero (resp. a pole) of r, the integer ordP(r) (resp. – ordP(r)) is called the multiplicity of the zero (resp. pole) P.

Theorem 2.43.

Let r be a rational function on the projective plane curve C defined over K. Then r has finitely many poles and zeros. Furthermore, .

This is one of the theorems that demand K to be algebraically closed. More explicitly, if K is not algebraically closed, any rational function continues to have only finitely many zeros and poles, but the sum of the orders of r at these points is not necessarily equal to 0. Also note that this sum, if taken over only the finite points of C, need not be 0, even when K is algebraically closed.

2.10.3. Maps Between Plane Curves

Now that we know how to define and evaluate rational functions on a curve, we are in a position to define rational maps between two curves. Let C1 : f1(X, Y, Z) = 0 and C2 : f2(X, Y, Z) = 0 be two projective plane curves defined over K by irreducible homogeneous polynomials f1, .

Definition 2.72.

A rational map (defined over K) is given by rational functions , , in K(C1) such that for each at which all of , and are defined, the point . One often uses the notation .

This, however, is not the complete story. A more precise characterization of a rational map is as follows:

A rational map is said to be defined at , if there exists a rational function (depending on P) such that , and are all defined at P, , and are not all zero and . A rational map which is defined at every point of C1(K) is called a morphism.

The curves C1 and C2 are said to be isomorphic (denoted C1C2), if there exist morphisms and ψ : C2C1 such that and are identity maps on C1(K) and C2(K) respectively.

Isomorphism is an equivalence relation on the set of all projective plane curves defined over K. Since two isomorphic curves share many common algebraic and geometric properties, it is of interest in algebraic geometry to study the equivalence classes (rather than the individual curves). If C1C2 and C2 has a simpler representation than C1, then studying the properties of C2 makes our job simpler and at the same time reveals all the common properties of C1. (See Section 2.11 for an example.)

**2.10.4. Divisors on Plane Curves

Let a be a symbol and n a positive integer. We represent by na the formal sum a+···+a (n times). We also define 0a := 0 and –na := n(–a), where the symbol –a satisfies a + (–a) = (–a) + a = 0. For n1, , we define n1a + n2a := (n1 + n2)a. The set under these definitions becomes an Abelian group. If we are given two symbols a, b we can analogously define formal sums na + mb, n, , and the sum of formal sums as (n1a + m1b) + (n2a + m2b) := (n1 + n2)a + (m1 + m2)b. With these definitions the set becomes an Abelian group. These constructions can be generalized as follows:

Definition 2.73.

Given a set (not necessarily finite) of symbols ai, , the set of formal sums of the form , where ni = 0 except for finitely many , is an Abelian group with the addition formula . This group is called the free Abelian group generated by ai, .

Now let ai be the K-rational points on a projective plane curve C defined over K. For notational convenience, we represent by [P] the symbol corresponding to the point P on C. This removes confusions in connection with elliptic curves C (See Section 2.11) for which we intend to make a distinction between P + Q and [P] + [Q] for two points P, . The former sum is again a point on C, whereas the latter is never (the symbol corresponding to) a point on C.

Definition 2.74.

A formal sum , , where nP = 0 except for finitely many , is called a divisor on C. The free Abelian group generated by the symbols [P] for all the points is called the group of divisors of C and is denoted by DivK(C) or simply by Div(C), when K is implicit in the context.

Let be a divisor. The support of D is defined to be the set and is denoted by Supp D.

The degree of D is defined as the integer and is denoted as deg D. The subset of Div(C) is clearly a subgroup of Div(C). We denote this subgroup by Div0(C).

Now we define divisors of rational functions on C. Henceforth we assume that C is smooth (that is, smooth at all K-rational points on C).

Definition 2.75.

The divisor of a rational function is defined to be the formal sum , where ordP(r) is the order of r at P (Definition 2.71). By Theorem 2.43 .

A divisor is called principal, if D = Div(r) for some rational function . We have Div(rr′) = Div(r) + Div(r′) for any rational functions r, . It follows that the set of all principal divisors on C is a subgroup of Div(C) (and of Div0(C) as well). We denote this subgroup by PrinK(C) or simply by Prin(C). The quotient group Div(C)/Prin(C) is called the divisor class group or the Picard group of C and is denoted by PicK(C) or in short by Pic(C). On the other hand, the quotient Div0(C)/Prin(C) is denoted by or Pic0(C) and is called the Jacobian of C. Instead of Pic0(C) we use the notation or .

Though the Jacobian is defined for an arbitrary smooth curve C (defined by an irreducible polynomial), it is a special class of curves called hyperelliptic curves for which it is particularly easy to represent and do arithmetic in the group . This gives us yet another family of groups on which cryptographic protocols can be built.

If K is not algebraically closed, we need not have for a rational function . This means that in that case the group cannot be defined in the above manner. However, since C is also a curve defined over , we can define as above and call a particular subgroup of as the Jacobian of C over K. We defer this discussion until Section 2.12.

Exercise Set 2.10

In this exercise set, we do not assume (unless otherwise stated) that K is necessarily algebraically closed.

2.95
  1. For homogeneous polynomials f1, of respective degrees d1 and d2, prove the following assertions:

    1. If d1 = d2, then f1 ± f2 are homogeneous polynomials of degree d1.

    2. The polynomial f1f2 is homogeneous of degree d1 + d2. Conversely, if f1f2 is homogeneous, then f1 and f2 are also homogeneous.

  2. A polynomial is homogeneous of degree d if and only if it satisfies fX1, . . ., λXn) = λdf(X1, . . ., Xn) for every .

2.96In this exercise, we generalize the notion of homogenization and dehomogenization of polynomials. Let K[X1, . . . , Xn] denote the polynomial ring in n indeterminates. Introducing another indeterminate X0, we define the homogenization of a polynomial as

Prove the following assertions.

  1. f(h) is an element of K[X0, X1, . . . , Xn] and is homogeneous of degree d.

  2. f(h)(1, X1, . . . , Xn) = f(X1, . . . , Xn).

  3. If deg f = d ≥ 0 and fd is the sum of all non-zero terms of degree d in f, then we have f(h)(0, X1, . . . , Xn) = fd(X1, . . . , Xn).

  4. For f, , (fg)(h) = f(h)g(h). Moreover, if g|f, then g(h)|f(h) and (f/g)(h) = f(h)/g(h). Under what condition(s) is (f + g)(h) = f(h) + g(h)?

  5. f is irreducible if and only if f(h) is irreducible.

2.97Let C : f(X, Y) = 0 be an affine plane curve defined by a non-zero polynomial and C : f(h)(X, Y, Z) = 0 the corresponding projective plane curve. Let d := deg f = deg f(h) and fd the sum of non-zero terms of f of degree d. Show that:
  1. f(h)(X, Y, 1) = f(X, Y) and f(h)(X, Y, 0) = fd(X, Y).

  2. is a K-rational point of the affine curve if and only if is a K-rational point of the projective curve. More generally, let . The point is a K-rational solution of f if and only if [x, y, λ] is a K-rational solution of f(h).

  3. The solutions of f at infinity are obtained by solving f(h)(X, Y, 0) = fd(X, Y) = 0. Conclude that the curve C can have at most d points at infinity.

  4. For a, , each of the curves YaX = b and XaY = b (straight lines), and YX2 = 0 and XY2 = 0 (parabolas) contains only one point at infinity. The hyperbola XY – 1 = 0 contains two points at infinity. How many points at infinity does the hyperbola X2Y2 – 1 = 0 contain? The circle X2 + Y2 – 1 = 0?

  5. For a1, a2, a3, a4, , the elliptic curve Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6 contains only one point at infinity.

  6. Let and u(X), with deg ug, deg v = 2g + 1 and v monic. Show that the hyperelliptic curve Y2 + u(X)Y = v(X) has only one point at infinity.

2.98Show that the defining polynomial of the elliptic curve in Exercise 2.97(e) is irreducible. Prove the same for the hyperelliptic curve of Exercise 2.97(f). [H]
2.99Show that for an ideal the following two conditions are equivalent:
  1. is generated by a set of homogeneous polynomials.

  2. If , where fi is the sum of non-zero terms of degree i in f, then for all i = 0, . . . , d. (The polynomials fi are called the homogeneous components of f.)

An ideal satisfying the above equivalent conditions is called a homogeneous ideal. Construct an example to demonstrate that all ideals of K[X1, . . . , Xn] need not be homogeneous.

*2.11. Elliptic Curves

The mathematics of elliptic curves is vast and complicated. A reasonably complete understanding of elliptic curves would require a book of comparable size as this. So we plan to be rather informal while talking about elliptic curves and about their generalizations called hyperelliptic curves. Interested readers can go through the books suggested at the end of this chapter to learn more about these curves. In this section, K stands for a field (finite or infinite) and the algebraic closure of K.

2.11.1. The Weierstrass Equation

An elliptic curve E over K is a plane curve defined by the polynomial equation

Equation 2.6


or by the corresponding homogeneous equation

E : Y2Z + a1XYZ + a3YZ2 = X3 + a2X2Z + a4XZ2 + a6Z3.

These equations are called the Weierstrass equations for E. In order that E qualifies as an elliptic curve, we additionally require that it is smooth at all -rational points (Definition 2.66).[12] Two elliptic curves defined over the field are shown in Figure 2.1.

[12] Ellipses are not elliptic curves.

Figure 2.1. Elliptic curves over

(a) Y2 = X3X + 1
(b) Y2 = X3X


E contains a single point at infinity, namely (Exercise 2.97(e)). The set of K-rational points on E in the projective plane is denoted by E(K) and is the central object of study in the theory of elliptic curves. We shortly endow E(K) with a group structure and this group is used extensively in cryptography.

Let us first see how we can simplify the equation for E. The simplification depends on the characteristic of K. Because fields of characteristic 3 are only rarely used in cryptography, we will not deal with such fields. Simplification of the Weierstrass equation is effected by suitable changes of coordinates. A special kind of transformation is allowed in order to preserve the geometric and algebraic properties of an elliptic curve.

Theorem 2.44.

Two elliptic curves

E1:Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6
E2:Y2 + b1XY + b3Y = X3 + b2X2 + b4X + b6

defined over K are isomorphic (Definition 2.72) if and only if there exist and r, s, such that the substitution of u2X + r for X and u3Y + u2sX + t for Y transforms the equation of E1 to the equation of E2. For this transformation, the coefficients bi are related to the coefficients ai as follows:

Equation 2.7


The theorem is not proved here. Formulas (2.7) can be checked by tedious calculations. A change of variables as in Theorem 2.44 is referred to as an admissible change of variables. We denote this by

(X, Y) ← (u2X + r, u3Y + u2sX + t).

The inverse transformation is also admissible and is given by

Isomorphism is an equivalence relation on the set of all elliptic curves over K.

Consider the elliptic curve E over K given by Equation (2.6). If char K ≠ 2, the admissible change transforms E to the form

E1 : Y2 = X3 + b2X2 + b4X + b6.

If, in addition, char K ≠ 3, the admissible change transforms E1 to E2 : Y2 = X3 + aX + b. We henceforth assume that an elliptic curve over a field of characteristic ≠ 2, 3 is defined by

Equation 2.8


(instead of by the original Weierstrass Equation (2.6)).

If char K = 2, the Weierstrass equation cannot be simplified as in Equation (2.8). In this case, we consider two cases separately, namely a1 ≠ 0 or otherwise. In the former case, the admissible change allows us to write Equation (2.6) in the simplified form

Equation 2.9


On the other hand, if a1 = 0, then the admissible change (X, Y) ← (X + a2, Y) shows that E can be written in the form

Equation 2.10


A curve defined by Equation (2.9) is called non-supersingular, whereas one defined by Equation (2.10) is called supersingular.

Now we associate two quantities with an elliptic curve. The importance of these quantities follows from the subsequent theorem. We start with the generic Weierstrass equation and later specialize to the simplified formulas.

Definition 2.76.

For the curve given by Equation (2.6), we define the following quantities:

Equation 2.11


Δ(E) is called the discriminant of the curve E, and j(E) the j-invariant of E.

For the special cases given by the simplified equations above, these quantities have more compact formulas as given in Table 2.5.

Theorem 2.45.

For the curve E defined by Equation (2.6), the following properties hold:

  1. An admissible change of variables does not alter Δ(E) and j(E).

    Table 2.5. Discriminant and j-invariant for elliptic curves
    Special caseΔ(E)j(E)
    char K ≠ 2, 3 (Equation 2.8)–16(4a3 + 27b2)1728(4a)3/Δ(E)
    char K = 2, non-supersingular (Equation 2.9)b1/b
    char K = 2, supersingular (Equation 2.10)a40

  2. E is an elliptic curve, that is, E is smooth, if and only if Δ(E) ≠ 0. In particular, the j-invariant is defined for all elliptic curves.

  3. Let E1 and E2 be two elliptic curves defined over the field K. If E1 and E2 are isomorphic over K, then j(E1) = j(E2). Conversely, if j(E1) = j(E2), then E1 and E2 are isomorphic over .

Proof

  1. Tedious calculations using Formulas (2.7) establish this claim.

  2. The polynomial f(X, Y, Z) = Y2Z + a1XYZ + a3YZ2X3a2X2Za4XZ2a6Z3 defines the curve E. Since , E is smooth at . Suppose that E is not smooth at the finite point . The admissible change (X, Y) ← (X + x0, Y + y0) does not alter the value of Δ(E) by (1). So we can assume, without loss of generality, that (x0, y0) = (0, 0). But then we have f(0, 0) = –a6 = 0, ∂f/∂x(0, 0) = –a4 = 0 and ∂f/∂y(0, 0) = a3 = 0. Now it is easy to check from Equation (2.11) that Δ(E) = 0.

    Conversely, let Δ(E) = 0. For simplicity, we assume that char K ≠ 2, 3 and E is given by Equation (2.8). By Exercise 2.62, , that is, the polynomial X3 + aX + b has multiple roots, say, . But then E is not smooth at .

  3. By Part (1) and Theorem 2.44, two isomorphic elliptic curves have the same j-invariant. For proving the converse, we once again assume that char K ≠ 2, 3 and E1 : Y2 = X3 + a1X + b1 and E2 : Y2 = X3 + a2X + b2 have the same j-invariant. Then we have . Now we provide an admissible change of variable of the form (X, Y) ← (u2X, u3Y), , that transforms E1 to E2. Since Δ(E1) ≠ 0 and Δ(E2) ≠ 0, we take u = (b1/b2)1/6 if a1 = 0, u = (a1/a2)1/4 if b1 = 0, and u = (a1/a2)1/4 = (b1/b2)1/6 if a1b1 ≠ 0. Note that since is algebraically closed, u is defined in all the above cases.

2.11.2. The Elliptic Curve Group

Consider an elliptic curve E over a field K. We now define an operation (which is conventionally denoted by +) on the set E(K) of K-rational points on E in the projective plane . This operation provides a group structure on E(K). It is important to point out that this group is not the same as the group DivK(E) of divisors on E(K) (Definition 2.74), since the sum of points we are going to define is not formal. However, there is a connection between these two groups (See Exercise 2.125).

Definition 2.77.

Let E be the elliptic curve defined by Equation (2.6) and the point at infinity on E. A binary operation + on E(K) is defined as follows:

  1. For any , we define , that is, serves as the additive identity.

  2. The opposite (additive inverse) of a point is now defined: if , then –P = P, and if , then –P = (h, –ka1ha3).

  3. For P, , the sum P + Q is defined by the chord and tangent rule which goes as follows.

    1. If Q = –P, then .

    2. If Q ≠ –P, we consider the line passing through P and Q (we take the tangent line if P = Q). Since the degree of the defining equation for E is three, this line meets the curve at exactly one other point R. We define P + Q = –R. Figure 2.1 illustrates this case for curves over .

Theorem 2.46.

The set E(K) under the operation + is an Abelian group.

No simple proof of this theorem is known. Indeed the only group axiom that is difficult to check is associativity, that is, to check that (P + Q) + R = P + (Q + R) for all P, Q, . An elementary strategy would be to write explicit formulas for (P + Q) + R and P + (Q + R) (using the formulas for P + Q given below) and show that they are equal, but this process involves a lot of awful calculations and consideration of many cases.

There are other proofs that are more elegant, but not as elementary. One possibility is to use the theory of divisors and is outlined now. It turns out that the Jacobian has a bijective correspondence with the set E(K) via the map which takes to (more correctly to the equivalence class of the divisor in ). Furthermore, , where the addition on the left is the addition on E(K) as defined above and the addition on the right is that in the Jacobian . By definition, is naturally an additive Abelian group. It immediately follows that E(K) is an additive Abelian group too. (See Exercise 2.125.)

We now give the formulas for the coordinates of the points –P and P + Q on E(K). The derivation of these formulas for the general case is left to the reader (Exercise 2.102). We concentrate on the important special cases. We assume that P = (h1, k1) and Q = (h2, k2) are finite points on E(K) with Q ≠ –P so that .

If char K ≠ 2, 3 and E is defined by Equation 2.8, we have:

Next, we consider char K = 2 and non-supersingular curves (Equation 2.9). The formulas in this case are:

Finally, for supersingular curves (Equation 2.10) with char K = 2, we have:

We denote by mP the sum P + · · · + P (m times) for a point and for . We also define and (–m)P := –(mP) (for ).

Example 2.22.
  1. Consider the elliptic curve

    E1 : Y2 = X3 + X + 3

    over . We have Δ(E1) ≡ –16(4 × 13 + 27 × 32) ≡ 3 (mod 7). Also j(E1) ≡ 1728 × 43 × 3–1 ≡ 2 (mod 7), that is, j(E1) = 2. It is easy to check that contains the six points , P1 = (4, 1), P2 = (4, 6), P3 = (5, 0), P4 = (6, 1) and P5 = (6, 6). The multiples of these points are summarized in Table 2.6. It follows that the group is cyclic with P1 as a generator.

    Table 2.6. Multiples of points on the elliptic curve Y2 = X3 + X + 3 over
    P2P3P4P5P6Pord P
         1
    P1 = (4, 1)(6, 6)(5, 0)(6, 1)(4, 6)6
    P2 = (4, 6)(6, 1)(5, 0)(6, 6)(4, 1)6
    P3 = (5, 0)    2
    P4 = (6, 1)(6, 6)   3
    P5 = (6, 6)(6, 1)   3

  2. Now, consider the non-supersingular elliptic curve

    E2 : Y2 + XY = X3 + X2 + ξ

    defined over , where ξ := T + 〈T3 + T + 1〉. We have Δ(E2) = ξ and j(E2) = ξ–1 = ξ2 + 1. The finite points on E2 are:

    P1=(0, ξ2 + ξ),
    P2=(1, ξ2),
    P3=(1, ξ2 + 1),
    P4=(ξ, ξ2),
    P5=(ξ, ξ2 + ξ),
    P6=(ξ + 1, ξ2 + 1),
    P7=(ξ + 1, ξ2 + ξ),
    P8=(ξ2 + ξ, 1),
    P9=(ξ2 + ξ, ξ2 + ξ + 1).

    So contains 10 points (including ). The multiples of the points are listed in Table 2.7, which implies that is again cyclic.[13] The φ(10) = 4 generators of this group are P4, P5, P8 and P9.

    [13] Both 6 and 10 are square-free integers, and so the groups and must be cyclic (Exercise 2.115(a)).

    Table 2.7. Multiples of points on the elliptic curve Y2 + XY = X3 + X2 + ξ over .
    P2P3P4P5P6P7P8P9P10Pord P
    P0         1
    P1        2
    P2P7P6P3     5
    P3P6P7P2     5
    P4P3P9P6P1P7P8P2P510
    P5P2P8P7P1P6P9P3P410
    P6P2P3P7     5
    P7P3P2P6     5
    P8P6P4P2P1P3P5P7P910
    P9P7P5P3P1P2P4P6P810

  3. Let us continue to represent as in (2). The supersingular curve

    E3 : Y2 + Y = X3 + ξX + ξ2

    has Δ(E3) = 1, j(E3) = 0. is a cyclic group with 9 points as Table 2.8 illustrates.

Table 2.8. Multiples of points on the elliptic curve Y2 + Y = X3 + ξX + ξ2 over
P2P3P4P5P6P7P8P9Pord P
P0 =        1
P1 = (0, ξ2 + ξ)P5P4P7P8P3P6P29
P2 = (0, ξ2 + ξ + 1)P6P3P8P7P4P5P19
P3 = (ξ + 1, ξ)P4      3
P4 = (ξ + 1, ξ + 1)P3      3
P5 = (ξ2, ξ2)P7P3P2P1P4P8P69
P6 = (ξ2, ξ2 + 1)P8P4P1P2P3P7P59
P7 = (ξ2 + ξ, ξ2 + ξ)P2P4P6P5P3P1P89
P8 = (ξ2 + ξ, ξ2 + ξ +1)P1P3P5P6P4P2P79

Definition 2.78.

Let . The set of points such that is evidently a subgroup of E(K) and is denoted by EK[m] or by E[m], if K is understood from the context. The elements of EK[m], called the m-torsion points of E, are those points of E(K), the (additive) orders of which are finite and divide m.

Multiples mP of a point can be expressed using nice formulas.

Definition 2.79.

For an elliptic curve defined over K by the equation E : f(X, Y) = 0 and for , there exist polynomials θm, ωm, , such that for any point with we have

mP = (θm(h, k)/ψm(h, k)2, ωm(h, k)/ψm(h, k)3).

The polynomial ψm is called the m-th division polynomial of E.

Using the addition formula one can verify the following recursive description for ψm and the expressions for θm and ωm in terms of ψm.

Lemma 2.8.

For an elliptic curve E defined by the general Weierstrass Equation (2.6) over a field K, the division polynomials ψm, , are recursively described as:

where di are as in Definition 2.76. The polynomials θm satisfy

for all ,

and for char K ≠ 2, one has

It follows by induction on m that these formulas really give polynomial expressions for ψm, θm and ωm for all . For even m, the polynomial ψm is divisible by ψ2. Furthermore, for the polynomials defined as

can be expressed as polynomials in x only. These univariate polynomials are easier to handle than the bivariate ones ψm and, by an abuse of notation, are also called division polynomials. The degrees of satisfy the inequality:

Points of E[m] can be characterized in terms of the division polynomials:

Theorem 2.47.

Ler and . Then if and only if ψm(h, k) = 0. Furthermore, if m > 2 and , then if and only if .

We finally define polynomials as follows. If char K ≠ 2, then for all . On the other hand, for char K = 2 and for non-supersingular curves over K we already have (Exercise 2.107), and it is customary to define fm(x) := ψm(x, y) for all . By further abuse of notations, we also call fm the m-th division polynomial of E.

2.11.3. Elliptic Curves over Finite Fields

In this section, we take , a finite field of cardinality q and characteristic p. We do not deal with the case p = 3. Let E be an elliptic curve defined over . If p > 3, we assume that E is defined by Equation (2.8), whereas for p = 2, we assume that E is defined by Equation (2.10) or Equation (2.9) depending on whether E is supersingular or not.

Since is a subset of , the cardinality is finite. The next theorem shows that is quite close to q.

Theorem 2.48. Hasse’s theorem

, where . (The integer t is called the trace of Frobenius at q.)

The implication of this theorem is that the possible cardinalities of lie in a rather narrow interval . If q = p is a prime, then for every n, , there is at least one curve E with . Moreover, the values of are distributed almost uniformly in the interval . However, if q is not a prime, these nice results do not continue to hold.

Definition 2.80.

If t = 1 (that is, if ), the curve E is called anomalous. If p|t, the curve E is called supersingular and if pt, then E is called non-supersingular.

Anomalous and supersingular curves are cryptographically weak, because certain algorithms are known with running time better than exponential to solve the so-called elliptic curve discrete logarithm problem over these curves. Determination of the order gives t from which one can easily check whether E is anomalous or supersingular. If p = 2, we have an easier check for supersingularity.

Proposition 2.35.

An elliptic curve E over a finite field of characteristic 2 is supersingular if and only if j(E) = 0 or, equivalently, if and only if a1 = 0 in Equation (2.6).

For arbitrary characteristic p, we have the following characterization.

Proposition 2.36.

An elliptic curve E over is supersingular if and only if t2 = 0, q, 2q, 3q or 4q. In particular, if char , 3, then E is supersingular if and only if t = 0.

By Theorem 2.38, the group is always cyclic. However, the group is not always cyclic, but is of a special kind. We need a few definitions to explain the structure of . The notion of internal direct product for multiplicative groups (Exercise 2.19) can be readily applied to additive groups as follows.

Definition 2.81.

Let G be an additive group and let H1, . . . , Hr be subgroups of G. If every element of G can be written uniquely as h1 + · · · + hr with , i = 1, . . . , r, we say that G is the (internal) direct sum of the subgroups H1, . . . , Hr and denote this as G = H1 ⊕ · · · ⊕ Hr.

Theorem 2.49. Structure theorem for finite Abelian groups

Let G be a finite additive Abelian group of cardinality #G = n. Then there exist and integers ni ≥ 2 for 1 ≤ ir, such that G is the direct sum of (subgroups isomorphic to the) cyclic groups , that is, , where ni+1|ni for all i = 1, . . . , r – 1. Furthermore, such a decomposition is unique in the sense that if with integers mi ≥ 2 and mi+1|mi for i = 1, . . . , s – 1, then r = s and ni = mi for all i = 1, . . . , r. In this case, we say that G has rank r and is of type (n1, . . . , nr). By Lagrange’s theorem, each ni|n. Moreover, n = n1n2 · · · nr. G is cyclic if and only if the rank of G is 1.

Theorem 2.50. Structure theorem for

The elliptic curve group is of rank 1 or 2. If the rank is 1, then is cyclic, otherwise , where n1, n2 ≥ 2 and n2|n1. In the second case, we have n2|(q – 1).

Once we know the order of the group , it is easy to compute the order of as the following theorem suggests.

Theorem 2.51.

Let α, satisfy 1 – tX + qX2 = (1 – αX)(1 – βX). Then for any the order .

Exercise Set 2.11

2.100Show that the following curves over K are not smooth (and hence not elliptic curves):
  1. Y2 = X3, K arbitrary.

  2. Y2 = X3 + X2, K arbitrary.

  3. Y2 = X3 + aX + b, if char K = 2.

2.101
  1. Show that for an elliptic curve E over K and a finite point , the only points in E(K) (or ) having X-coordinate equal to h are P and –P.

  2. Let char K ≠ 2, 3 and let E be defined by Equation (2.8). If α1, α2, are the roots (distinct by Theorem 2.45) of X3 + aX + b, then (α1, 0), (α2, 0) and (α3, 0) are the only points on with Y-coordinate equal to 0. Show that these are the only points of order 2 in .

2.102Let P = (h1, k1) and Q = (h2, k2) be two points (different from ) in E(K) defined by the Weierstrass Equation (2.6). Assume that Q ≠ –P. Determine R = (h3, k3) = P + Q as follows:
  1. Show that the line passing through P and Q (the tangent, if P = Q) has the equation Y = λX + μ, where

  2. Substituting λX + μ for Y in Equation (2.6) gives a cubic equation in X of which h1 and h2 are two roots. Show that the third root (the X-coordinate of R) is

    h3 = λ2 + a1λ – a2h1h2.

    Hence deduce that the Y-coordinate of R is

    k3 = –(λ + a1)h3 – μ – a3.

2.103Let . Show that there exists an elliptic curve E over K such that . [H]
2.104Assume that char K ≠ 2, 3 and consider the elliptic curve E given by Equation (2.8). Let K[E] be the affine coordinate ring and K(E) the field of rational functions on E.
  1. Show that every element in K[E] can be uniquely represented as u(x) + yv(x) for polynomials u(x), .

  2. The conjugate of is defined as . The norm of f is defined as . Show that .

  3. The degree of is defined as deg f := max(2 degx u, 3 + 2 degx v), where degx denotes the degree in x. Show that deg f = degx N(f).

  4. Show that for f, one has N(fg) = N(f) N(g). Hence conclude that deg(fg) = deg f + deg g.

  5. Show that every rational function in K(E) can be represented as a(x) + yb(x), where a(x), .

2.105Show that the division polynomials for the general Weierstrass equation can be recursively defined as

where F = 4x3 + d2x2 + 2d4x + d6.

2.106Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.8 over a field K of characteristic ≠ 2, 3. Show that for m ≥ 2 and for we have

2.107Write the recursive formulas for the division polynomials ψm(x, y) and for the elliptic curve E defined by Equation 2.9 over a field K of characteristic 2. Conclude that ψm are polynomials in only x for all . With fm := ψm for all show that for m ≥ 2 and for we have

2.108Consider the elliptic curve defined over the field :

Ea,b : Y2 = X3 + aX + b.

Verify the following assertions: (You may write a computer program.)

  1. Each Ea,b has order between 3 and 13.

  2. The curve E0,3 : Y2 = X3 + 3 has the maximum possible order 13.

  3. The curve E0,4 : Y2 = X3 + 4 has the minimum possible order 3.

  4. The curve E0,5 : Y2 = X3 + 5 is anomalous.

  5. The group is not cyclic.

2.109Consider the representation of as , where ξ is a root of T3 + T + 1 in . Identify an element (where ) with the integer (a2a1a0)2 = a222 + a12 + a0. For integers a, , b ≠ 0, define the non-supersingular elliptic curve:

Ea,b : Y2 + XY = X3 + aX2 + b.

Verify the following assertions: (You may write a computer program.)

  1. Each Ea,b has order between 4 and 14.

  2. The curve E1,1 : Y2 + XY = X3 + X2 + 1 has the maximum possible order 14.

  3. The curve E2,1 : Y2 + XY = X3 + ξX2 + 1 has the minimum possible order 4.

  4. The curve E2,2 : Y2 + XY = X3 + ξX2 + ξ is anomalous.

  5. The orders of Ea,b for all choices of a, b lie in the set {4, 6, 8, 10, 12, 14}.

  6. Each is cyclic.

  7. Theorem 2.45(3) requires the phrase over , that is, two curves over an algebraically non-closed field having the same j-invariant may be non-isomorphic.

2.110Consider the representation of and the identification of elements of with integers as in Exercise 2.109. For a, b, , a ≠ 0, define the supersingular elliptic curve:

Ea,b,c : Y2 + aY = X3 + bX + c.

Verify the following assertions: (You may write a computer program.)

  1. Each Ea,b,c has order between 5 and 13.

  2. The curve E1,1,1 : Y2 + Y = X3 + X + 1 has the maximum possible order 13.

  3. The curve E1,1,2 : Y2 + Y = X3 + X + ξ has the minimum possible order 5.

  4. The orders of Ea,b,c for all choices of a, b, c lie in the set {5, 9, 13}.

  5. No Ea,b,c is anomalous.

  6. Each is cyclic.

2.111Consider the elliptic curve E : Y2 + XY = X3 + X2 + 1 defined over for all . Show that

where r = ⌊n/2⌋. [H] Conclude that E is anomalous over , but not so over .

2.112Let K be a finite field of characteristic ≠ 2, 3 and E : Y2 = X3 + aX + b an elliptic curve defined over K. Prove that:
  1. #E(K) is odd if and only if X3 + aX + b is irreducible in K[X]. [H]

  2. E(K) is not cyclic if X3 + aX + b splits in K[X].

  3. The converse of Part (b) does not hold. [H]

2.113Let E : Y2 + XY = X3 + aX2 + b be a non-supersingular elliptic curve defined over . Prove that:
  1. has exactly one point of order 2. [H]

  2. is even.

2.114Let E : Y2 + aY = X3 + bX + c be a supersingular elliptic curve over . Prove that:
  1. has no points of order 2.

  2. is odd.

2.115
  1. Let G be a finite Abelian group of cardinality n. Show that if n is square-free, then G is cyclic. [H]

  2. Prove that if E is an anomalous elliptic curve over , then is cyclic. [H]

  3. If E is a supersingular elliptic curve over the field of characteristic ≠ 2, 3, prove that is either cyclic or isomorphic to . [H]

2.116Let , p ≡ 3 (mod 4), and a, . Consider the elliptic curve E : Y2 = X3a2X over (or over ). Prove that:
  1. contains at most three points of order three.

  2. The points of order three in are precisely the points of order three in .

2.117A Weierstrass equation of an elliptic curve defined over a field K is said to be in the Legendre form, if it can be written as

Equation 2.12


for some , k ≠ 0, 1. Show that if char K ≠ 2, then every Weierstrass equation over K can be written in the Legendre form. Show that the j-invariant of the curve E defined by Equation (2.12) is .

**2.12. Hyperelliptic Curves

Hyperelliptic curves are generalizations of elliptic curves. We cannot define a group structure on a general hyperelliptic curve in the way as we did for elliptic curves. We instead work in the Jacobian of a hyperelliptic curve. For an elliptic curve E over an algebraically closed field K, the Jacobian is canonically isomorphic to the group E(K). Thus one can as well use the techniques for hyperelliptic curves for describing and working in elliptic curve groups. However, the exposition of the previous section turns out to be more intuitive and computationally oriented.

2.12.1. The Defining Equations

A hyperelliptic curve C of genus over a field K is defined by a polynomial equation of the form

Equation 2.13


In order that C qualifies as a hyperelliptic curve, we additionally require that C (as a projective curve) be smooth over . The set of K-rational points on C is denoted as usual by C(K). For g = 1, Equation (2.13) is the same as the Weierstrass Equation (2.6) on p 98, that is, elliptic curves are hyperelliptic curves of genus one. A hyperelliptic curve of genus 2 over is shown in Figure 2.2.

Figure 2.2. A hyperelliptic curve of genus 2 over : Y2 = X(X2 – 1)(X2 – 2)


A hyperelliptic curve has only one point at infinity (Exercise 2.97(f)) and is smooth at . If char K ≠ 2, substituting simplifies Equation (2.13) as . Since is a monic polynomial in K[X] of degree 2g + 1, we may assume that if char K ≠ 2, the equation for C is of the form:

Equation 2.14


Proposition 2.37.

If char K ≠ 2, then the hyperelliptic curve C defined by Equation (2.14) is smooth if and only if v has no multiple roots (in ). If char K = 2, then the curve defined by Equation (2.14) is never smooth.

Proof

First, consider char K ≠ 2. If v has a multiple root, say , then v′(α) = 0 and, therefore, C is not smooth at the finite point . Conversely, if (h, k) is a singular point on , then we have 2k = 0 and v′(h) = 0. Since (h, k) = (h, 0) is a point on C, we have v(h) = 0, that is, h is a multiple root of v.

For char K = 2 and , we have (∂(Y2v(X))/∂X)(h, k) = v′(h) and (∂(Y2v(X))/∂Y)(h, k) = 0. Now, v′(X) is a monic polynomial of degree 2g > 0 and, therefore, has at least one root, say . But then C is not smooth at .

Definition 2.82.

Let P = (h, k) be a finite point on the hyperelliptic curve C defined by Equation (2.13). The point is called the opposite of P.[14] P and are the only points on C with X-coordinate equal to h. If , then P is called a special point on C, otherwise it is called an ordinary point on C. The set of all finite (resp. ordinary, resp. special) points on C is denoted by Cfin(K) (resp. Cord(K), resp. Cspl(K)). These notations are also abbreviated as Cfin, Cord and Cspl, if the field K is understood from the context.

[14] It is customary to define the opposite of to be itself.

2.12.2. Polynomial and Rational Functions

All the general theory we described in Section 2.10 continues to be valid for hyperelliptic curves. However, since we are now given an explicit equation describing the curves, we can give more explicit expressions for polynomial and rational functions on hyperelliptic curves. For simplicity, we consider the affine equation and extend our definitions separately for the point at infinity.

Consider the hyperelliptic curve C defined by Equation (2.13). By Exercise 2.98, the defining polynomial f(X, Y) := Y2 + u(X)Yv(X) (or its homogenization) is irreducible over , so that the affine (or projective) coordinate ring of C is an integral domain and the corresponding function field is simply the field of fractions of the coordinate ring.

Let . Since y2 + u(x)yv(x) = 0 in K[C], we can repeatedly substitute y2 by –u(x)y + v(x) in G(x, y) until the y-degree of G(x, y) becomes less than 2. This proves part of the following:

Proposition 2.38.

Every polynomial function can be written uniquely as G(x, y) = a(x) + yb(x) for some a(X), .

Proof

In order to establish the uniqueness, note that if G(x, y) = a1(x) + yb1(x) = a2(x) + yb2(x), then . Since the Y -degree of f is 2, this implies [a1(X) + Y b1(X)] – [a2(X) + Y b2(X)] = 0, that is, [a1(X) – a2(X)] + [b1(X) – b2(X)]Y = 0, that is, a1(X) = a2(X) and b1(X) = b2(X).

Definition 2.83.

Let . The conjugate of G is defined to be the polynomial function . The norm of G is defined as .

Some useful properties of the norm function are listed in the following lemma, the proof of which is left to the reader as an easy exercise.

Lemma 2.9.

For G, , we have:

  1. .

  2. If G(x, y) = a(x) + yb(x), then N(G) = a(x)2a(x)b(x)u(x) – v(x)b(x)2. In particular, .

  3. .

  4. N(GH) = N(G) N(H).

We also have an easy description of the rational functions on C.

Proposition 2.39.

Every rational function can be written in the form s(x) + yt(x) for some s(X), .

Proof

We can write r(x, y) = G(x, y)/H(x, y) for G, , H ≠ 0. Multiplying both the numerator and the denominator by and using Lemma 2.9(2) and Proposition 2.38 completes the proof.

The value of a rational function on C at a finite point on C can be defined as in the case of general curves (See Definition 2.68). In order to define the value of a rational function at the point , we need some other concepts.

For a moment, let us assume that . From the equation of C, we see that k2h2g+1 (neglecting lower-degree terms) for sufficiently large coordinates h, k of a point . This means that k tends to infinity exponentially (2g + 1)/2 times as fast as h does. So it is customary to give Y a weight (2g + 1)/2 times a weight we give to X. The smallest integral weights of X and Y to satisfy this are 2 and 2g + 1 respectively. This motivates us to provide Definition 2.84 (generalized for any K).

Definition 2.84.

Let . The degree of G is defined to be deg G := max(2 degx a, 2g + 1 + 2 degx b), where degx denotes the usual x-degree of a polynomial in K[x]. Since a and b are uniquely determined by G, deg G is well-defined. If G = 0, we set deg G := –∞.

If 0 ≠ G = a(x)+yb(x), d1 = degx a and d2 = degx b, then the leading coefficient of G is taken to be the coefficient of xd1 in a(x) if deg G = 2d1, or to be the coefficient of xd2 in b(x) if deg G = 2g + 1 + 2d2. (We cannot have 2d1 = 2g + 1 + 2d2, since the left side is even and the right side is odd.)

Some basic properties of the degree function follow.

Lemma 2.10.

For G, , we have:

  1. deg G = degx(N(G)).

  2. deg(GH) = deg G + deg H.

  3. .

Proof

Easy exercise.

Now we are in a position to give an explicit definition of the value of a rational function at .

Definition 2.85.

For with G, , we define as:

If deg(G) < deg(H), then .

If deg(G) > deg(H), then (that is, r is not defined at ).

If deg(G) = deg(H), then is defined as the ratio of the leading coefficients of G and H.

Now that we have a complete description of the value of a rational function at any point on C, poles and zeros of rational functions on C can be defined as in Definition 2.70. In order to define the order of a polynomial or rational function at a point P on C, we should find a uniformizing parameter uP at P. Tedious calculations help one deduce the following explicit expressions for uP.

Proposition 2.40.

Let be a finite point. Then we can take

as a uniformizing parameter at P. Finally, is a uniformizing parameter at the point at infinity (where g is the genus of C).

We give an alternative definition of the order (independent of uP), which is computationally useful and which is equivalent to Definition 2.71 for a hyperelliptic curve.

Definition 2.86.

Let and . The order of G at P is defined as follows. First, let P = (h, k) be a finite point on C. Let e be the largest exponent such that (xh)e divides both a(x) and b(x). We write G = (xh)eG1(x, y). If G1(h, k) ≠ 0 we set l := 0, otherwise we set l to be the highest exponent such that (xh)l divides N(G1). We then define

Finally, we define .

Now, let r(x, y) = G(x, y)/H(x, y) be a rational function on C and . We define the order of r at P as ordP(r) := ordP(G) – ordP(H). The value ordP(r) can be shown to be independent of the choice of G and H.

Example 2.23.

Let be a finite point on C. Consider the rational function , . The only points on C with X-coordinate equal to h are P and its opposite . Therefore, if P is an ordinary point, , whereas if P is a special point, ordP (r) = 2m. Moreover, . For any , we have ordQ(r) = 0.

Now consider r = (xh)m for some m < 0. Write r = G/H with G = 1 and H = (xh)m. Since ordQ(r) = ordQ(G) – ordQ(h), we continue to have

If m ≥ 0, then r is a polynomial function and has zeros P and and no finite poles. In this case, the sum of the orders of its zeros is 2m = 2 degx r = deg r. Theorem 2.52 generalizes this observation.

Theorem 2.52.

A non-constant polynomial function has only finitely many zeros and a single pole at . Furthermore, if K is algebraically closed, then .

2.12.3. The Jacobian

We continue to work with the hyperelliptic curve C of Equation (2.13). We first impose the restriction that K is algebraically closed and use the theory of Section 2.10 to define the set Div(C) of divisors on C, the degree zero part Div0(C) of Div(C), the divisor Div(r) of a rational function , the set Prin(C) of principal divisors on C, the Picard group Pic(C) = Div(C)/ Prin(C) and the Jacobian .

Example 2.24.

For the rational function r := (xh)m of Example 2.23, we have:

The Jacobian is the set of all cosets of Prin(C) in Div0(C). It is not a good idea to work with cosets (which are equivalence classes). Recall that in the case of , we represented a coset by the remainder of Euclidean division of a by n. In case of the representation , we took polynomials of smallest degrees as canonical representatives of the cosets of 〈f(X)〉. In case of too, we intend to find such good representatives, one from each coset. We now introduce the concept of reduced divisors for that purpose.

Definition 2.87.

Two divisors D1, (resp. in Div(C)) are said to be equivalent, denoted D1 ~ D2, if , or equivalently if .

Our goal is to associate to every divisor some unique reduced divisor with D ~ Dred, that is, Dred plays the role of the canonical representative of . We start with the following definition.

Definition 2.88.

A divisor is called semi-reduced, if each mP ≥ 0 and if for mP > 0 we have: if P is an ordinary point, and mP = 1 if P is a special point.

Proposition 2.41.

Every divisor is equivalent to some semi-reduced divisor D1.

Proof

Let , with and with Cord being the disjoint union of C1 and C2, where an ordinary point if and only if its opposite and . Now we can write D = D1 + D2, where

and

with m1 and m2 so chosen that D1, . By definition, D1 is semi-reduced, whereas by Example 2.24 , where

Now, we explain how we can represent a semi-reduced divisor by a pair of polynomials a(x), . For that, we need a definition.

Definition 2.89.

Let and be two divisors on C (not necessarily in Div0(P)). The greatest common divisor (gcd) of D1 and D2 is defined as the divisor

Theorem 2.53.

Let be a semi-reduced divisor on C. Let Pi = (hi, ki), i = 1, . . . , n, be the only finite points P on C such that mP > 0. Let mi := mPi, and (so that degx(a) = m). Then there exists a unique polynomial with the following properties:

  1. degx b < m,

  2. b(hi) = ki for i = 1, . . . , n,

  3. a(x) divides b(x)2 + b(x)u(x) – v(x), and

  4. .

Conversely, if a(x), with degx b < degx a and with a dividing b2 + buv, then the divisor gcd is semi-reduced.

We denote the divisor gcd by Div(a, b). The zero divisor has the representation Div(1, 0).

A representation of the elements of by semi-reduced divisors (that is, by pairs of polynomials in K[x]) suffers from two disadvantages. First, the representation is not unique, and second, the degrees of the representing polynomials may be quite large. These difficulties are removed if we consider semi-reduced divisors of a special kind.

Definition 2.90.

A semi-reduced divisor is called a reduced divisior, if , where g is the genus of C.

The following theorem establishes the desirable properties of a reduced divisor.

Theorem 2.54.

For , there exists a unique reduced divisor D1 equivalent to D.

Proof

We only prove the existence of reduced divisors. For the proof of the uniqueness, one may, for example, see Koblitz [154]. The norm of a divisor is defined as the integer .

Let . By Proposition 2.41 there exists a semi-reduced divisor D′ ~ D. One can easily verify that |D′| ≤ |D|. If we already have |D′| ≤ g, then D′ is a desired reduced divisor. So assume otherwise, that is, |D′| ≥ g + 1. We can then choose finite points P1, . . . , Pg+1 on C (not necessarily all distinct) such that is a subsum of the formal sum D′. Let the semi-reduced divisor be represented as Div(a, b) with degx a = g + 1 and degx bg. But then deg(b(x) – y) = 2g + 1 and b(x) – y has zeros at P1, . . . , Pg+1 by Theorem 2.53. So by Theorem 2.52 we can write for some finite points Q1, . . . , Qg on C. Now satisfies D″ ~ D′ and |D″| < |D′|. We apply Proposition 2.41 again to get a semi-reduced divisor D‴ ~ D″ with |D‴| ≤ |D″|. Thus starting from the semi-reduced divisor D′ we produce another semi-reduced divisor D‴ such that D‴ ~ D′ ~ D and |D‴| < |D′|. We continue the process a finite number of times, until we get an equivalent semi-reduced divisor D1 of norm ≤ g. This is a desired reduced divisor.

From the viewpoint of cryptography, the field K should be a finite field which is never algebraically closed. So we must remove the restriction . Since C is naturally defined over as well, we start with the Jacobian and define a particular subgroup of to be the Jacobian of C over K.

Definition 2.91.

Let be a K-automorphism of . For a point , the point is also in . For a divisor , we define . D is said to be defined over K if for all . The subset of consisting of divisor classes that have representative divisors defined over K is a subgroup (denoted by ) of and is called the Jacobian of C over K.

Every element of can be represented uniquely as a reduced divisor Div(a, b) for polynomials a(x), with degx ag and degx b < degx a. is, therefore, a finite Abelian group. For suitably chosen hyperelliptic curves, these groups can be used to build cryptographic protocols.

Exercise Set 2.12

In this exercise set, we let C denote a hyperelliptic curve of genus g defined by Equation (2.13) over a field K (not necessarily algebraically closed).

2.118
  1. Show that the curve

    C1 : Y2 = X5 + X + 1

    defined over is not smooth and so not a hyperelliptic curve. Find a point where C1 is not smooth.

  2. Show that the curve

    C2 : Y2 = X5 + X + 2

    defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C2. (There are ten of them.)

2.119Represent as , where ξ is a root of the irreducible polynomial .
  1. Show that the curve

    C3 : Y2 + XY = X5 + X + 1

    defined over is not smooth and so not a hyperelliptic curve. Find a point where C3 is not smooth.

  2. Show that the curve

    C4 : Y2 + XY = X5 + X + ξ

    defined over is smooth, that is, a hyperelliptic curve of genus 2. Find out all the -rational points on C4. (There are eight of them.)

2.120Let . Prove the following assertions:
  1. The only points on C with X-coordinate equal to h are P and .

  2. .

  3. P is a special point if and only if u2(h) + 4v(h) = 0.

  4. If char K ≠ 2, then C has at most 2g + 1 special points, whereas if char K = 2, then C has at most g special points.

2.121Prove Lemmas 2.9 and 2.10.
2.122Let and .
  1. Show that G(P) = 0 if and only if .

  2. Let . Show that either P is a special point of C or h is a common root of u and v.

  3. Show that and that .

2.123Prove Theorem 2.52. [H]
2.124A line on C is a polynomial function of the form with a, b, , a and b not both 0.
  1. Let D = Div(l) be the divisor of a line l. Show that the norm |D| is either 2 or 2g + 1.

  2. Let . Determine Div(xh).

  3. Determine Div(y).

2.125Let E be an elliptic curve (that is, a hyperelliptic curve of genus 1) defined over K.
  1. Show that any divisor can be written as for some unique point and for some rational function . This rational function r is unique up to multiplication by elements of .

  2. Show that the map that maps the residue class of to the point satisfying for some , is a bijection.

  3. Let P, , not both . Show that there is a line l with , where R = –(P + Q).

  4. Let , where σ is defined in Part (b). Show that for P, one has . (This, in particular, proves Theorem 2.46 and that σ is a group isomorphism.)

  5. Let . Show that D is a principal divisor if and only if (integer sum) and (sum in ).

**2.13. Number Fields

In this section, we develop the theory of number fields and rings. Our aim is to make accessible to the readers the working of the cryptanalytic algorithms based on number field sieves.

2.13.1. Some Commutative Algebra

Commutative algebra is the study of commutative rings with identity (rings by our definition). Modern number theory and geometry are based on results from this area of mathematics. Here we give a brief sketch of some commutative algebra tools that we need for developing the theory of number fields.

Ideal arithmetic

We start with some basic operations on ideals (cf. Example 2.7, Definition 2.23).

Definition 2.92.

Let A be a ring and let , , be a family (not necessarily finite) of ideals in A.

The set-theoretic intersection is evidently an ideal in A.

The sum of the family is the ideal

Two ideals and of A are said to be relatively prime or coprime, if , or equivalently if there exist and with a + b = 1.

If I = {1, 2, . . . , n} is finite, the product is the ideal generated by all elements of the form x1x2 . . . xn with for all i = 1, . . . , n. We have:

If , the product is denoted as . The empty product of ideals is conventionally taken to be the unit ideal A. If is the principal ideal 〈a〉, then .

One can readily check that the operations intersection, sum and product on ideals in a ring are associative and commutative.

Commutative algebra extensively uses the theory of prime and maximal ideals (Definition 2.19, Proposition 2.9, Corollary 2.2 and Exercise 2.23). The set of all prime ideals in A is called the (prime) spectrum of A and is denoted by Spec A. The set of all maximal ideals of A is called the maximal spectrum of A and denoted by Spm A. We have Spm A ⊆ Spec A. These two sets play an extremely useful role for the study of the ring A. If A is non-zero, both these sets are non-empty.

Localization

The concept of formation of fractions of integers to give the rationals can be applied in a more general setting. Instead of having any non-zero element in the denominator of a fraction we may allow only elements from a specific subset. All we require to make the collection of fractions a ring is that the allowed denominators should be closed under multiplication.

Definition 2.93.

Let A be a ring. A non-empty subset S of A is called multiplicatively closed or simply multiplicative, if and for any s, we have .

Example 2.25.
  1. For a non-zero ring A, the subset A \ {0} is multiplicatively closed, if and only if A is an integral domain. For a general non-zero ring A, the set of all elements such that a is not a zero-divisor is a multiplicative subset of A.

  2. Let A be a ring and a a proper ideal of A. The set is multiplicatively closed, if and only if is a prime ideal of A.

  3. For a ring A and an element , the set {1, f, f2, f3, . . .} ⊆ A is multiplicatively closed.

Let A be a ring and S a multiplicative subset of A. We define a relation ~ on A × S as: (a, s) ~ (b, t) if and only if u(atbs) = 0 for some . (If A is an integral domain, one may take u = 1 in the definition of ~.) It is easy to check that ~ is an equivalence relation on A × S. The set of equivalence classes of A × S under ~ is denoted by S–1A, whereas the equivalence class of is denoted as a/s. For a/s, , define (a/s) + (b/t) := (at + bs)/(st) and (a/s)(b/t) := (ab)/(st). It is easy to check that these operations are well-defined and make S–1 A a ring with identity 1/1, in which each s/1, , is invertible. There is a canonical ring homomorphism taking aa/1. In general, is not injective. However, if A is an integral domain and 0 ∉ S, then the injectivity of can be proved easily and we say that the ring A is canonically embedded in the ring S–1A.

Definition 2.94.

Let A be a ring and S a multiplicative subset of A. The ring S–1A constructed as above is called the localization of A away from S or the ring of fractions of A with respect to S.

Example 2.26.
  1. Let A be an integral domain and let S = A \ {0}. Then S–1A is called the quotient field or the field of fractions of A and is denoted as Q(A). If A is already a field, then Q(A) ≅ A. Other examples include and Q(K[X]) = K(X), K a field, where K(X) denotes the field of rational functions over K in one indeterminate X.

    More generally, if A is any ring and S is the set of all non-zero-divisors of A, then S–1A is called the total quotient ring of A and is again denoted by Q(A). It is, in general, not a field. If A is an integral domain, then S = A \ {0} and the usage of Q(A) remains consistent.

  2. Let A be a ring, a prime ideal of A and . Then S–1A is called the localization of A at and is usually denoted by Ap.

  3. Let A be a ring, and S = {1, f, f2, f3, . . . }. In this case, S–1A is conventionally denoted by Af.

Integral dependence

The concept of integral dependence generalizes the notion of integers. Recall that for a field extension KL, an element is called algebraic over K, if α is a root of a non-zero polynomial . Since K is a field, the polynomial f can be divided by its leading coefficient, giving a monic polynomial in K[X] of which α is a root. However, if K is not a field, division by the leading coefficient is not always permissible. So we require the minimal polynomial to be monic in order to define a special class of objects.

Definition 2.95.

Let AB be an extension of rings. An element is said to be integral over A, if α satisfies[15] (that is, is a root of) a monic (and hence non-zero) polynomial . An equation of the form f(α) = 0, monic, is called an equation of integral dependence of α over A.

[15] Strictly speaking, α being a root of f(X) is equivalent to α satisfying the polynomial equation f(α) = 0. Often the term equation is dropped in this context—a harmless colloquial contraction.

Example 2.27.
  1. If both A and B are fields, the concepts of integral and algebraic elements are the same. (See the argument preceding Definition 2.95.)

  2. Take and and let , gcd(a, b) = 1, be integral over . Let (a/b)n + αn–1(a/b)n–1 + · · · + α1(a/b) + α0, , be an equation of integral dependence of a/b over . Multiplication by bn gives an = –bn–1an–1 + · · · + α1abn–2 + α0bn–1), that is, b|an. Since gcd(a, b) = 1, this forces b = ±1, that is, . This is, in general, true for any UFD A and its field of fractions B = Q(A) (See Exercise 2.131).

  3. Every element is integral over A, since it satisfies the monic polynomial .

Now let AB be an extension of rings and let C consist of all the elements of B that are integral over A. Clearly, ACB. It turns out that C is again a ring. This result is not at all immediate from the definition of integral elements. We prove this by using the following lemma which generalizes Theorem 2.33.

Lemma 2.11.

For a ring extension AB and for , the following conditions are equivalent:

  1. α is integral over A.

  2. A[α] is a finitely generated A-module.

  3. A[α] ⊆ C for some subring C of B with C being a finitely generated A-module.

Proof

[(a)⇒(b)] Let αn + an–1αn–1 + · · · + a1α + a0 = 0, , be an equation of integral dependence of α over A. is generated as an A-module by 1, α, α2, . . . . In order to show that only the elements 1, α, . . . , αn–1 generate A[α] as an A-module, it is sufficient to show that each αk, , is an A-linear combination of 1, α, . . . , αn–1. We proceed by induction on k. The assertion certainly holds for k = 0, . . . , n – 1, whereas for kn we write αk = –(an–1αk–1 + · · · + a1αkn+1 + a0αkn), whence induction completes the proof.

[(b)⇒(c)] Take C := A[α].

[(c)⇒(a)] Let generate C as an A-module. Since A[α] ⊆ C and, in particular, , for all i = 1, . . . , n we can write for some . Let denote the matrix (αδijaij)1≤i,jn, where δij is the Kronecker delta. Then . Multiplication (on the left) by the adjoint of shows that for all i = 1, . . . , n. Since , we have for some , so that (det ) · 1 = 0, that is, det . But det is a monic polynomial in α of degree n and with coefficients from A.

Proposition 2.42.

For an extension AB of rings, the set

is a subring of B containing A.

Proof

Clearly, ACB as sets. To show that C is a ring let α, . By Condition (b) of Lemma 2.11, A[α] is a finitely generated A-module. Now β, being integral over A, is also integral over A[α]; so again by Lemma 2.11(b), A[α][β] is a finitely generated A[α]-module. It is then easy to check that A[α, β] = A[α][β] is a finitely generated A-module. Since α ± β and αβ are in A[α, β], by Lemma 2.11(c), these elements are integral over A, that is, belong to C. Thus C is a ring.

Definition 2.96.

The ring C of Proposition 2.42 is called the integral closure of A in B. A is called integrally closed in B, if C = A. On the other hand, if C = B, we say that B is an integral extension of A or that B is integral over A.

An integral domain A is called integrally closed (without specific mention of the ring in which it is so), if A is integrally closed in its quotient field Q(A). An integrally closed integral domain is called a normal domain (ND).

Example 2.28.
  1. (or more generally any UFD) is a normal domain.

  2. is not integrally closed in or , since, for example, is integral over . The integral closure of in is denoted by . Elements of are called algebraic integers (See Exercise 2.60).

Noetherian rings

Recall that a PID is a ring (integral domain) in which every ideal is principal, that is, generated by a single element. We now want to be a bit more general and demand every ideal to be finitely generated. If a ring meets our demand, we call it a Noetherian ring. These rings are named after Emmy Noether (1882–1935) who was one of the most celebrated lady mathematicians of all ages and whose work on Noetherian rings has been very fundamental and deep in the branch of algebra. Emmy’s father Max Noether (1844 –1921) was also an eminent mathematician.

Definition 2.97.

Let A be a ring and let be an ascending chain of ideals of A. This chain is called stationary, if there is an such that . The ring A is said to satisfy the ascending chain condition or the ACC, if every ascending chain of ideals in A is stationary, or in other words, if there does not exist any infinite strictly ascending chain of ideals in A.

Proposition 2.43.

For a ring A, the following conditions are equivalent:

  1. Every ideal of A is finitely generated.

  2. A satisfies the ascending chain condition.

  3. Every non-empty set of ideals of A contains a maximal element.

Proof

[(a)⇒(b)] Let be an ascending chain of ideals of A. Consider the ideal which is finitely generated by hypothesis. Let a1, . . . , ar be a set of generators of . Each , that is, there exists such that and hence for every nmi. Take m := max(m1, . . . , mr). For every nm, we have a , that is, .

[(b)⇒(c)] Let S be a non-empty set of ideals of A. Order S by inclusion. The ACC implies that every chain in S has an upper bound in S. By Zorn’s lemma, S has a maximal element.

[(c)⇒(a)] Let be an ideal of A. Consider the set S of all finitely generated ideals of A contained in . S is non-empty, since it contains the zero ideal. By condition (c), S has a maximal element, say, . If , take . Then is finitely generated (since is so), properly contains and is contained in . This contradicts the maximality of in S. Thus we must have , that is, is finitely generated.

Definition 2.98.

A ring A is called Noetherian, if A satisfies (one and hence all of) the equivalent conditions of Proposition 2.43.

Example 2.29.
  1. All PIDs are Noetherian, since principal ideals are obviously finitely generated. In particular, and K[X] (K a field) are Noetherian.

  2. If A is Noetherian and an ideal of A, then is Noetherian, since the ideals of are in one-to-one inclusion-preserving correspondence with the ideals of A containing a and hence satisfy the ACC.

  3. Let A be a Noetherian ring and S a multiplicative subset of A. Then the localization B := S–1A is also Noetherian. To prove this fact let be an ideal in B. One can show that for some ideal of A. Since A is Noetherian, is finitely generated, say, . It is now (almost) obvious that is generated by a1/1, . . . , ar/1. A particular case: If A is Noetherian and a prime ideal of A, then the localization is also Noetherian.

  4. The ring of polynomials with infinitely many indeterminates X1, X2, X3, . . . is not Noetherian. This is because the ideal

    X1, X2, X3, . . .〉 = AX1 + AX2 + AX3 + · · ·

    is not finitely generated, or alternatively because we have the infinite strictly ascending chain of ideals: 〈X1〉  〈X1, X2〉  〈X1, X2, X3〉  · · ·, or because the set S := {〈X1〉, 〈X1, X2〉, 〈X1, X2, X3〉, . . .} of ideals in A does not contain a maximal element.

We have seen that if A is a PID, the polynomial ring A[X] need not be a PID. However, the property of being Noetherian is preserved during the passage from A to A[X] (Theorem 2.8).

Dedekind domains

A class of rings proves to be vital in the study of number fields:

Definition 2.99.

An integral domain A is called a Dedekind domain, if it satisfies all of the following three conditions:

  1. A is Noetherian.

  2. Every non-zero prime ideal of A is maximal.

  3. A is integrally closed (in its quotient field K := Q(A)).

2.13.2. Number Fields and Rings

After much ado we are finally in a position to define the basic objects of study in this section.

Definition 2.100.

A number field K is defined to be a finite (and hence algebraic) extension of the field of rational numbers. Clearly, . The extension degree is called the degree of the number field K and is finite by definition.

Note that there exist considerable controversies among mathematicians in accepting this definition of number fields. Some insist that any field K satisfying should be called a number field. Some others restrict the definition by demanding that one must have K algebraic over ; however, fields K with infinite extension degree are allowed. We restrict the definition further by imposing the condition that has to be finite. Our restricted definition is seemingly the most widely accepted one. In this book, we study only the number fields of Definition 2.100 and accepting this definition would at the minimum save us from writing huge expressions like “(algebraic) number fields of finite extension degree over ” to denote number fields.

For number fields, the notion of integral closure leads to the following definition.

Definition 2.101.

A number field K contains and hence . The integral closure of in K is called the ring of integers of K and is denoted by . ( is the Gothic O.) Clearly, and is an integral domain. We also have , where is the subset of comprising all algebraic integers. A number ring is a ring which is (isomorphic to) the ring of integers of a number field.

By Example 2.27(2), the ring of integers of the number field is , that is, . It is, therefore, customary to call the elements of rational integers. Since is naturally embedded in for any number field K, it is important to notice the distinction between the integers of K (that is, the elements of ) and the rational integers of K (that is, the images of the canonical inclusion ).

Some simple properties of number rings are listed below.

Proposition 2.44.

For a number field K, we have:

  1. .

  2. For , there exists a rational integer such that . In particular, the quotient field of is K.

  3. is integrally closed in , that is, is a normal domain.

Proof

(1) follows immediately from Example 2.27(2), (2) follows from Exercise 2.60, and (3) follows from Exercise 2.126(b).

Let K be a number field of degree d. By Corollary 2.13, K is a simple extension of , that is, there exists an element with a minimal polynomial f(X) over such that deg and . The field K is a -vector space of dimension d with basis 1, α, . . . , αd–1. There exists a nonzero integer a such that is an algebraic integer and we continue to have . Thus, without loss of generality, we may take α to be an algebraic integer. In this case, the -basis 1, α, . . . , αd–1 of K consists only of algebraic integers.

Conversely, let be an irreducible polynomial of degree d ≥ 1. The field is a number field of degree d and the elements of K can be represented by polynomials with rational coefficients and of degrees < d. Arithmetic in K is carried out as the polynomial arithmetic of followed by reduction modulo the defining irreducible polynomial f(X). This gives us an algebraic representation of K independent of any element of K. Now, K can also be viewed as a subfield of and the elements of K can be represented as complex numbers.[16] A representation with a field isomorphism is called a complex embedding of K in .[17] Such a representation is not unique as Proposition 2.45 demonstrates.

[16] A complex number has a representation by a pair (a, b) of real numbers. Here, plays the role of X + 〈X2 + 1〉 in . Finally, every real number has a decimal (or binary or hexadecimal or . . .) representation.

[17] The field is canonically embedded in K. It is evident that the embedding σ : KK′ fixes element-wise.

Proposition 2.45.

A number field K of degree d ≥ 1 has exactly d distinct complex embeddings.

Proof

As above we take for some irreducible polynomial of degree d. Since is a perfect field (See Exercise 2.76), the d roots of f(X) are all distinct. For each i = 1, . . . , d, the map sending X + 〈f(X)〉 ↦ αi clearly extends to a field isomorphism . Thus we get d distinct complex embeddings of K in . Now let K′ be a subfield of , such that is a -isomorphism. Let α := σ(X + 〈f(X)〉). Then 0 = σ(0) = σ(f(X + 〈f(X)〉)) = f(σ(X + 〈f(X)〉)) = f(α). Thus α is a root of f, that is, α = αi for some . Since K′ is a field containing and αi and having , it follows that and σ = σi.

This proposition says that the conjugates α1, . . . , αd are algebraically indistinguishable. For example, X2 + 1 has two roots ±i, where . But it makes little sense to talk about the positive and the negative square roots of –1? They are algebraically indistinguishable and if one calls one of these i, the other one becomes –i.[18] However, if a representation of is given, we can distinguish between and by associating these quantities with the elements and respectively, where is the positive real square root of 5 and where is the imaginary unit available from the given representation of .

[18] In a number theory seminar in 1996, Hendrik W. Lenstra, Jr. commented:

Suppose the Martians defined the complex numbers by adjoining a root of –1 they called j. And when the Earth and Martians start talking, they have to translate i to be either j or –j. So we take i to j, because I think that’s what the scientists will decide. ··· But it was later discovered that most Martians are left handed, so the philosophers decide it’s better to send i to –j instead.

It is also quite customary to start with for some algebraic and seek for the complex embeddings of K in . One then considers the minimal polynomial f(X) of α (over ) and proceeds as in the proof of Proposition 2.45 but now defining the map as the unique field isomorphism that fixes and takes α ↦ αi. If we take α = α1, then σ1 is the identity map, whereas σ2, . . . , σd are non-identity field isomorphisms.

The moral of this story is that whether one wants to view the number field K as or as for any is one’s personal choice. In any case, one will be dealing with the same mathematical object and as long as representation issues are not brought into the scene, all these definitions of a number field are absolutely equivalent.

The embeddings need not be all distinct as sets. For example, the two embeddings and of are identical as sets. But the maps x ↦ i and x ↦ –i are distinct (where x := X + 〈X2 + 1〉). Thus while specifying a complex embedding of a number field K, it is necessary to mention not only the subfield K′ of isomorphic to K, but also the explicit field isomorphism KK′.

Definition 2.102.

Let K be a number field of degree d defined by an irreducible polynomial or by any root of f(X). Let r1 be the number of real roots and 2r2 the number of non-real roots of f. (Note that the non-real roots of a real polynomial occur in (complex) conjugates.) By the fundamental theorem of algebra, we have d = r1 + 2r2. For any real root α of f, the complex embedding of K is completely contained in and hence is often called a real embedding of K. On the other hand, for a non-real root β of f the complex embedding of K is called a non-real or a properly complex embedding of K. The pair (r1, r2) is called the signature of the number field K. K has r1 real embeddings and 2r2 properly complex embeddings. If r2 = 0, that is, if all embeddings of K are real, one calls K a totally real number field. On the other hand, if r1 = 0, that is, if all embeddings of K are properly complex, then K is called a totally complex number field.

Example 2.30.
  1. The number field is totally real and has the signature (2, 0). (The roots of X2 – 2 are .)

  2. The number field is totally complex and has the signature (0, 1). (The roots of X2 + 2 are .)

  3. The number field is neither totally real nor totally complex. The roots of X3 – 2 are and . The signature of K is (1, 1), that is, K has one real embedding and two properly complex embeddings.

The simplest examples of number fields are the quadratic number fields, that is, number fields of degree 2. Some special properties of quadratic number fields are covered in the exercises. It follows from Exercise 2.136 that every quadratic number field is of the form for some non-zero square-free integer D ≠ 1.

Now we investigate the -module structure of for a number field K of degree d. Let σ1, . . . , σd be the complex embeddings of K.

Definition 2.103.

For an element , we define the trace of α (over ) as

Equation 2.15


and the norm of α (over ) as

If g(X) is the minimal polynomial of α over and r := deg g, then r|d. Moreover, . So Tr(α) and N(α) belong to . If α is an algebraic integer, then , that is, Tr(α), .

The following properties of the norm and trace functions can be readily verified. Here α, and .

Tr(α + β)=Tr(α) + Tr(β),
N(αβ)=N(α)N(β),
Tr(cα)=c Tr(α),
N(cα)=cdN(α),
Tr(c)=cd,
N(c)=cd.

Definition 2.104.

Let . We call the determinant of the matrix (Tr(βiβj))1≤i,jd, whose ij-th entry is equal to Tr(βiβj), the discriminant Δ(β1, . . . , βd) of β1, . . . , βd. Since each Tr, it follows that . Moreover, if β1, . . . , βd are all algebraic integers, then .

Proposition 2.46.

Δ(β1, . . . , βd) = (det(σji)))2.

Proof

Consider the matrices D := (Tr(βiβj)) and E := (σji)). By definition, we have Δ(β1, . . . , βd) = det D. We show that D = EEt, which implies that det D = (det E)2. The ij-th entry of EEt is

where the last equality follows from Equation (2.15).

Let for some and let f(X) be the minimal polynomial of α over . We define the discriminant of f as

Δ(f) := Δ(1, α, α2, ..., αd–1).

We have to show that the quantity Δ(f) is well-defined, that is, independent of the choice of the root α of f(X). Let α = α1, α2, . . . .αd be all the roots of f(X) and let the complex embedding σj of K map α to αj. By Proposition 2.46, we have Δ(f) = (det E)2, where . Computing the determinant of E gives , which implies that Δ(f) is independent of the permutations of the conjugates α1, . . . , αd of α. Notice that since α1, . . . , αd are all distinct, Δ(f) ≠ 0.

Let us deduce a useful formula for Δ(f). Write and take formal derivative to get , that is, . Therefore, , that is,

Equation 2.16


For arbitrary , the discriminant Δ(β1, . . . , βd) discriminates between the cases that β1, . . . , βd form a -basis of K and that they do not.

Lemma 2.12.

Let satisfy for i = 1, . . . , d and for . Then Δ(γ1, . . . , γd) = (det T)2Δ(β1, . . . , βd), where T = (tij).

Proof

Let E1 := (σji)) and E2 := (σji)). Now

is the ij-th entry of the matrix T E1, that is, E2 = T E1. Hence

Δ(γ1, . . . , γd) = (det E2)2 = (det T)2(det E1)2 = (det T)2Δ(β1, . . . , βd).

Corollary 2.19.

Let and be two -bases of K. Let and . Then , where T is the change-of-basis matrix from to .

Corollary 2.20.

form a -basis of K, if and only if Δ(β1, . . . , βd) ≠ 0.

Proof

Let , and . Since is a -basis of K, each βi can be written (uniquely) as with . By Lemma 2.12, , where . We have seen that . Therefore, is a -basis of K.

Finally comes the desired characterization of .

Theorem 2.55.

For a number field K of degree d, the ring is a free -module of rank d.

Proof

Let form a -basis of K. We know that for some the elements r1β1, . . . , rdβd are in and continue to constitute a -basis of K. So we may assume that the elements β1, . . . , βd are already in . Consider the set S of all -basis (β1, . . . , βd) of K consisting of elements from only. By Definition 2.104 and Corollary 2.20, for every . Choose such that is minimal in S.

Claim: is linearly independent over .

is a -basis of K, that is, linearly independent over and so trivially over too.

Claim: generates as a -module.

Assume not, that is, there exists such that α = a1β1 + · · · + adβd with some . Without loss of generality, we may assume that and write a1 = a + r with and 0 < r < 1. Define γ1 := α – aβ1 = rβ1 + a2β2 + · · · + adβd, γ2 := β2, . . . , γd := βd. Clearly, . Furthermore, if

by Lemma 2.12, we have

Δ(γ1, . . . , γd) = (det T)2Δ(β1, . . . , βd) = r2Δ(β1, . . . , βd).

Since r ≠ 0, Δ(γ1, . . . , γd) ≠ 0, that is, (γ1, . . . , γd) is again a -basis of K (Corollary 2.20), that is, . Finally since r < 1, we have |Δ(γ1, . . . , γd)| < |Δ(β1, . . . , βd)|, a contradiction to the choice of (β1, . . . , βd). Thus every has to be a -linear combination of β1, . . . , βd. This completes the proof of the second claim and also of the theorem.

Definition 2.105.

Any -basis of is called an integral basis of K (or of ).

Corollary 2.21.

Every integral basis of K has the same discriminant (for a given K).

Proof

Let and be two integral bases of K. Let T be the -to- change-of-basis matrix. being an integral basis of K, all the entries of T are integers. Also from Corollary 2.19 we have and hence divides and has the same sign as . One can analogously show . Therefore, .

Definition 2.106.

Let be an integral basis of a number field K. The discriminant of K is defined to be the integer . By Corollary 2.21, ΔK is well-defined, that is, independent of the choice of the integral basis of K.

Recall that K, as a vector space over , always possesses a -basis of the form 1, α, . . . , αd–1. , as a -module, is free of rank d, but every number field K need not possess an integral basis of the form 1, α, . . . , αd–1. Whenever it does, is called monogenic and an integral basis 1, α, . . . , αd–1 of K is called a power integral basis. Clearly, if K has a power integral basis 1, α, . . . , αd–1, then . But the converse is not true, that is, for with , 1, α, . . . , αd–1 need not be an integral basis of K, even when is monogenic.

Example 2.31.

Consider the quadratic number field for some square-free integer D ≠ 0, 1. We consider the two cases (See Exercise 2.136):

Case 1: D ≡ 2, 3 (mod 4)

Here , that is, is a power integral basis of K. The minimal polynomial of is X2D and the conjugates of are ±. Therefore, by Equation (2.16), we have

Case 2: D ≡ 1 (mod 4)

In this case, , that is, is a power integral basis of K. The minimal polynomial of is and the conjugates of are ±. Therefore, Equation (2.16) gives

2.13.3. Unique Factorization of Ideals

Ideals in a number ring possess very rich structures. We prove that number rings are Dedekind domains (Definition 2.99). A Dedekind domain (henceforth abbreviated as DD) need not be a UFD (or a PID). However, it is a ring in which ideals admit unique factorizations into products of prime ideals.

Let K be a number field of degree and its ring of integers. If is a homomorphism of rings and if is a prime ideal of B, then the contraction is a prime ideal of A. We say that lies above or over . If AB and is the inclusion homomorphism, then . For a number field K, we consider the natural inclusion .

Lemma 2.13.

Let be a non-zero prime ideal of . Then lies above a unique non-zero prime ideal of . In particular, contains a (unique) rational prime.

Proof

Let . If , then both and 0 are prime ideals of that lie over the zero ideal of . Since , by Exercise 2.128(c), a contradiction.

Proposition 2.47.

is Noetherian.

Proof

Let constitute an integral basis of K, that is, , that is, the ring homomorphism mapping f(X1, . . . , Xd) ↦ f1, . . . , αd) is surjective. By Hilbert’s basis theorem (Theorem 2.8), the polynomial ring is Noetherian and so , being the quotient of a Noetherian ring (by the isomorphism theorem), is Noetherian too (Example 2.29).

Theorem 2.56.

The ring of integers of a number field K is a Dedekind domain.

Proof

We have proved that is Noetherian (Proposition 2.47) and integrally closed (Proposition 2.44). It then suffices to show that each non-zero prime ideal of is maximal. By Lemma 2.13, lies over a non-zero prime ideal of . But is maximal in . Exercise 2.128(b) now completes the proof.

Now we derive the unique factorization theorem for ideals in a DD. It is going to be a long story. We refer the reader to Definition 2.92 to recall how the product of two ideals is defined.

Lemma 2.14.

Let A be a ring, , ideals of A, and a prime ideal of A such that . Then for some . In particular, if A is a DD and are non-zero prime ideals, then for some .

Proof

The proof is obvious for r = 1. So assume that r > 1. If for all i = 1, . . . , r, then for each i we can choose and see that , a contradiction to that is prime. The last statement of the lemma follows from the fact that in a DD every non-zero prime ideal is maximal.

We now generalize the concept of ideals.

Definition 2.107.

Let A be an integral domain and K := Q(A). An A-submodule of K is called a fractional ideal of A, if for some .

Every ideal of A is evidently a fractional ideal of A and hence is often called an integral ideal of A. Conversely, every fractional ideal of A contained in A is an integral ideal of A. The principal fractional ideal Ax is the A-submodule of K generated by . If A is a Noetherian domain, we have the following equivalent characterization of fractional ideals.

Lemma 2.15.

Let A be a Noetherian integral domain, K := Q(A) and . Then is a fractional ideal of A, if and only if is a finitely generated A-submodule of K.

Proof

[if] Let , where xi = ai/bi, ai, , bi ≠ 0. Then .

[only if] Let be such that . Now ba is an (integral) ideal of A (easy check) and is finitely generated, since A is Noetherian. Let , . Then , where .

We define the product of two fractional ideals , of an integral domain A as we did for integral ideals:

It is easy to check that is again a fractional ideal of A. Let denote the set of non-zero fractional ideals of A. The product of fractional ideals defines a commutative and associative binary operation on . The ideal A acts as a (multiplicative) identity in . A fractional ideal of A is called invertible, if for some fractional ideal of A. We deduce shortly that if A is a DD, then every non-zero fractional ideal of A is invertible and, therefore, is a group under multiplication of fractional ideals.

Lemma 2.16.

Let A be a Noetherian domain and an (integral) ideal of A. For some , there exist prime ideals of A each containing such that .

Proof

Let S be the set of ideals of A for which the lemma does not hold. Assume that . Since A is Noetherian, S contains a maximal element, say . Clearly, is a proper non-prime ideal of A, that is, for some a, we have . The ideals and strictly contain and, therefore, by the maximality of are not in S, that is, there exist prime ideals each containing (and hence ) such that and prime ideals each containing (and hence ) such that . Moreover, , since , so that , a contradiction. Thus S must be empty.

Note that the condition “each containing ” was necessary in Lemma 2.16 in order to rule out the trivial possibility that for some .

Lemma 2.17.

Let A be a DD, K := Q(A) and a non-zero prime ideal of A. Define the set

.

Then we have:

  1. is a fractional ideal of A.

  2. .

  3. . In particular, every non-zero prime ideal in a DD is invertible.

Proof

  1. Clearly, is an A-submodule of K, and for , we have .

  2. Since , we have . In order to prove the strict inclusion, we take any and consider the ideal . By Lemma 2.16, there exist prime ideals each containing (and hence non-zero) such that . We choose r to be minimal, so that does not contain the product of any r – 1 of . Now and hence by Lemma 2.14 for some i, say, i = r. Choose any . Since , we have . On the other hand, and , so that , that is, .

  3. By the definition of , it follows that is contained in and hence an integral ideal of A. Since , it follows that . Since is a maximal ideal, we then have or . Assume that . We claim that this assumption implies that , a contradiction to Part (2). So we must have . For proving the claim, let and choose . Then we have and, therefore, and so on. For each , define the ideal . Then is an ascending chain of ideals in A. Since A is Noetherian, the chain must be stationary, that is, for some we have , that is, , that is, with . Since A is an integral domain and a ≠ 0, we see that b is integral over A. Since A is integrally closed, . Therefore, , as claimed.

Theorem 2.57.

Every non-zero ideal in a DD A can be represented as a product of prime ideals of A. Moreover, such a factorization of is unique up to permutations of the factors.

Proof

If , there is nothing to prove. So let be a proper ideal of A. We first show that if contains a product of non-zero prime ideals, then is a product of prime ideals. By Lemma 2.16, we have prime ideals , , of A each containing , such that . Let us choose r to be minimal and proceed by induction on r. If r = 1, is already prime. So take r > 1 and assume that if an ideal of A contains a product of r – 1 or less non-zero prime ideals of A, then is a product of prime ideals. Let be a maximal ideal containing . We then have and by Lemma 2.14 for some i, say, i = r. Now, consider the fractional ideal . Then and so is an integral ideal of A. Furthermore , that is, contains a product of r – 1 non-zero prime ideals. By the induction hypothesis, is a product of prime ideals, that is, . But then is also a product of prime ideals.

In order to prove the uniqueness of this product, let with prime ideals and . Now and by Lemma 2.14 for some , say, j = 1. Then . Proceeding in this way shows the desired uniqueness.

In the factorization of a non-zero ideal of a DD, we do not rule out the possibility of repeated occurrences of factors. Taking this into account shows that every non-zero ideal in a DD A admits a unique factorization

with distinct non-zero prime ideals and with exponents . Here uniqueness is up to permutations of the indexes 1, . . . , r. This factorization can be extended to fractional ideals, but this time we have to allow non-positive exponents. First note that for integers e1, . . . , er and non-zero prime ideals of A the product is well-defined and is a fractional ideal of . The converse is proved in the following corollary.

Corollary 2.22.

Every non-zero fractional ideal of a DD A admits a unique factorization of the form with non-zero prime ideals of A and with exponents . Moreover for such a fractional ideal we have .

Proof

By definition, there exists such that . But then is an integral ideal of A. We write and with fi, . Since each non-zero prime ideal is invertible (Lemma 2.17(3)), it follows that . This proves the existence of a factorization of . The proof for the uniqueness is left to the reader as an easy exercise. The last assertion follows from a repeated use of Lemma 2.17(3).

The fractional ideal in Corollary 2.22 is denoted by . We have . One can easily verify that defined as above is equal to the set

In fact, one can use the last equality as the definition for .

To sum up, every non-zero fractional ideal of a DD A is invertible and the set of all non-zero fractional ideals of A is a group. The unit ideal A acts as the identity in .

As in every group, we have the cancellation law(s) in .

Corollary 2.23.

Let A be a DD and , , fractional ideals of A. If and , then .

In view of unique factorization of ideals in A, we can speak of the divisibility of integral ideals in A. Let and be two integral ideals of A. We say that divides and write , if for some integral ideal of A. We now show that the condition is equivalent to the condition . Thus for ideals in a DD the term divides is synonymous with contains.

Corollary 2.24.

Let and be integral ideals of a DD A. Then if and only if .

Proof

[if] If , we have , that is, is an integral ideal of A.

Also .

[only if] If for some integral ideal , we have .

Corollary 2.25.

Let and with ei, be the prime decompositions of two non-zero integral ideals of a DD A. Then if and only if eifi for all i = 1, . . . , r.

Proof

[if] We have , where is an integral ideal of A.

[only if] Let for some integral ideal of A. Clearly, and we can write the prime decomposition with li ≥ 0. We have . By unique factorization, we have f1 = e1 + l1, . . . , fr = er + lr and lr+1 = · · · = lr+s = 0.

As we pass from to , the notion of unique factorization passes from the element level to the ideal level. If a DD is already a PID, these two concepts are equivalent. (Non-zero prime ideals in a PID are generated by prime elements.) Though every UFD need not be a PID, we have the following result for a DD.

Proposition 2.48.

A Dedekind domain A is a UFD, if and only if A is a PID.

Proof

[if] Every PID is a UFD (Theorem 2.11).

[only if] Let A be a UFD. In order to show that A is a PID, it suffices (in view of Theorem 2.57) to show that every non-zero prime ideal of A is a principal ideal. Choose any non-zero . Then . Now a is a non-unit in A (since otherwise we would have ) and A is assumed to be a UFD. Thus we can write a = uq1 · · · qr for , and for prime elements qi in A. Clearly, each 〈qi〉 is a non-zero prime ideal of A and 〈a〉 = 〈q1〉 · · · 〈qr〉. Therefore, and hence by Lemma 2.14 for some .

In the rest of this section, we abbreviate as , if K is implicit in the context.

2.13.4. Norms of Ideals

We have seen that the ring is a free -module of rank d. The same result holds for every non-zero ideal of . Let β1, . . . , βd constitute an integral basis of K.

One can choose rational integers aij with each aii positive such that

Equation 2.17


constitute a -basis of . Moreover, the discriminant Δ(γ1, . . . , γd) is independent of the choice of an integral basis γ1, . . . , γd of and is called the discriminant of , denoted . It follows that can be generated as an ideal (that is, as an -module) by at most d elements. We omit the proof of the following tighter result.

Proposition 2.49.

Every (integral) ideal in a DD A is generated by (at most) two elements. More precisely, for a proper non-zero ideal of A and for any there exists with .

Definition 2.108.

The norm of a non-zero ideal of is defined as the cardinality of the quotient ring . It is customary to define the norm of the zero ideal as zero.

Using the integers aij of Equations (2.17), we can write

Equation 2.18


Corollary 2.26.

For every non-zero ideal of , the quotient ring is a finite ring. In particular, if is a non-zero prime (hence maximal) ideal of , then is a finite field.

It is tempting to define the norm of an element to be the norm of the principal ideal . It turns out that this new definition is (almost) the same as the old definition of N(α). More precisely:

Proposition 2.50.

For any element , we have N(〈α〉) = |N(α)|.

Proof

The result is obvious for α = 0. So assume that α ≠ 0 and call . Let β1, . . . , βd be an integral basis of . It is an easy check that αβ1, . . . , αβd is an integral basis of . Let σ1, . . . , σd be the complex embeddings of K. Then is the square of the determinant of the matrix

It follows that . Equation (2.18) now completes the proof.

Corollary 2.27.

For any , we have .

Like the norm of elements, the norm of ideals is also multiplicative. We omit the (not-so-difficult) proof here.

Proposition 2.51.

Let and be ideals in . Then, .

The following immediate corollary often comes handy.

Corollary 2.28.

Let and be non-zero ideals of . If is the factorization of , then . In particular, if , then (in ).

2.13.5. Rational Primes in Number Rings

The behaviour of rational primes in number rings is an interesting topic of study in algebraic number theory. Let K be a number field of degree d and . Consider a rational prime p and denote by 〈p〉 the ideal generated by p in . We use the symbol to denote the (prime) ideal of generated by p. Further let

Equation 2.19


be the prime factorization of 〈p〉 with , with pairwise distinct non-zero prime ideals of and with . For each i, we have , that is, , that is, (Lemma 2.13), that is, lies over . Conversely if is an ideal of lying over , then , that is, , that is, , that is, for some i. Thus, are precisely all the prime ideals of that lie over .

By Corollary 2.27, N(〈p〉) = pd. By Corollary 2.28, each divides pd and is again a power pdi of p.

Definition 2.109.

We define the ramification index of over p (or ) as . This is the largest such that divides (that is, contains) 〈p〉. The integer di (where is called the inertial degree of over p.

By the multiplicative property of norms, we have

Definition 2.110.

If r = d, so that each ei = di = 1, we say that the prime p (or )splits completely in . On the other extreme, if r = 1, e1 = 1, d1 = d, then 〈p〉 is prime in and we say that p is inert in . Finally, if ei > 1 for some i, we say that the prime p ramifies in . If r = 1 and e1 = d (so that d1 = 1), then the prime p is said to be totally ramified in .

The following important result is due to Dedekind. Its proof is long and complicated and is omitted here.

Theorem 2.58.

A rational prime p ramifies in , if and only if p divides the discriminant ΔK. In particular, there are only finitely many rational primes that ramify in .

Though this is not the case in general, let us assume that the ring is monogenic (that is, for some ) and try to compute the explicit factorization (Equality (2.19)) of 〈p〉 in . In this case, and let be the minimal polynomial of α. We then have .

Let us agree to write the canonical image of any polynomial in as . We write the factorization of as

with and with pairwise distinct irreducible polynomials . If , then . For each i = 1, . . . , r choose whose reduction modulo p is . Define the ideals

of . Since , we have

and

Therefore, are non-zero prime ideals of with . Thus . On the other hand, , since f(α) = 0 and . Thus we must have , that is, we have obtained the desired factorization of 〈p〉.

Let us now concentrate on an example of this explicit factorization.

Example 2.32.

Let D ≠ 0, 1 be a square-free integer congruent to 2 or 3 modulo 4. If , then is monogenic. We take an odd rational prime p and compute the factorization of 〈p〉 in . We have to factorize modulo p the minimal polynomial f(X) := X2D. We consider three cases separately based on the value of the Legendre symbol .

Case 1:

In this case, p|D, that is, . Then , where . Thus p (totally) ramifies in .

Case 2:

Since p is assumed to be an odd prime, the two square roots of D modulo p are distinct. Let δ be an integer with δ2D (mod p). Then . In this case, , where and . Thus p splits (completely) in .

Case 3:

The polynomial is irreducible in and hence remains prime in , that is, p is inert in .

Thus the quadratic residuosity of D modulo p dictates the behaviour of p in .

Let us finally look at the fate of the even prime 2 in . If D is even, then and if D is odd, then . In each case, 2 ramifies in .

Recall from Example 2.31 that ΔK = 4D. Thus we have a confirmation of the fact that a rational prime p ramifies in if and only if pK.

One can similarly study the behaviour of rational primes in

,

where D ≡ 1 (mod 4) is a square-free integer ≠ 0, 1.

2.13.6. Units in a Number Ring

There are just two units in , namely ±1. In a general number ring, there may be many more units. For example, all the units in the ring of Gaussian integers are ±1, ±i. There may even be an infinite number of units in a number ring. It can be shown that , , are all the units of . (Note that for all n ≠ 0 the absolute values of are different from 1.) is a PID. So we can think of factorizations in as element-wise factorizations. To start with, we fix a set of pairwise non-associate prime elements of . Every non-zero element of admits a factorization for prime “representatives” pi and for a unit u of the form . Thus, in order to complete the picture of factorization, we need machinery to handle the units in a number ring.

Let K be a number field of degree d and signature (r1, r2). We have d = r1 + 2r2. The set of units in is denoted by . We know that is an (Abelian) group under (complex) multiplication. Our basic aim now is to reveal the structure of the group .

Every Abelian group is a -module and, if finitely generated and not free, contains torsion elements, that is, (non-identity) elements of finite order > 1.[19] always contains the element –1 of order 2. The torsion subgroup of is denoted by . We have , where is a torsion-free group. It turns out that ℜ is a finite group (and hence cyclic) and that is finitely generated and hence free, that is, for some . From Dirichlet’s unit theorem (which we do not prove), it follows that ρ = r1 + r2 – 1. Thus, has a -basis consisting of ρ elements, say ξ1, . . . , ξρ, and every unit of can be uniquely expressed as , where ω is a root of unity and . A set of generators of is called a set of fundamental units.

[19] Every finitely generated torsion-free module over a PID is free.

Example 2.33.

Let D ≠ 0, 1 be a square-free integer, and . If D < 0, the signature of K is (0, 1) and the value of ρ for is 0 + 1 – 1 = 0, that is, , that is, is finite in this case.

Now, suppose D > 0. K is a real field in this case, so that . Also the signature of K is (2, 0), that is, ρ = 2 + 0 – 1 = 1. This means that contains an infinite number of units. Let ξ be a fundamental unit of . Then, every unit of is of the form ±ξn, .

Exercise Set 2.13

2.126
  1. If AB and BC are integral extensions of rings, show that AC is also an integral extension.

  2. Let AB be an extension of rings. Show that the integral closure of A in B is integrally closed in B.

  3. Let AB be an integral extension of rings, an ideal of B and . (Note that is an ideal of A. If is prime in B, then is prime in A. See Proposition 2.10.) Show that is integral over .

2.127Let AB be an extension of integral domains, a finitely generated non-zero ideal of A and . If , show that γ is integral over A. [H]
2.128
  1. Let AB be an integral extension of integral domains. Show that A is a field if and only if B is a field.

  2. Let AB be an integral extension of rings, a prime ideal of B and . Show that is maximal if and only if is maximal. [H]

  3. Let A, B, and be as in (b). Further let be another prime ideal of B with . Show that if , then . [H]

2.129Let A be a ring and S a multiplicatively closed subset of A. Show that:
  1. If , then S–1A is the zero ring.

  2. If S′ := S \ {1} is non-empty and closed under multiplication, then S–1AS–1A.

  3. If A is Noetherian, then S–1A is also Noetherian.

2.130Let AB be a ring extension and C the integral closure of A in B. Show that for any multiplicative subset S of A (and hence of B and C) the integral closure of S–1A in S–1B is S–1C. In particular, if A is integrally closed in B, then so is S–1A in S–1B.
2.131Recall that an integrally closed integral domain is called a normal domain (ND).
  1. Show that every UFD is a normal domain.

  2. Let D be a square-free integer ≠ 0, 1. Show that , is normal if and only if D ≡ 2, 3 (mod 4).

(Remark: The reader should note the following important implications:

That is, a Euclidean domain is a PID, a PID is a UFD and a UFD is a normal domain. Neither of the reverse implications is true. For example, the ring of integers of is known to be a PID but not a Euclidean domain. The ring K[X1, . . . , Xn], n ≥ 2, of multivariate polynomials over a field K is a UFD, but not a PID, since the ideal 〈X1, . . . , Xn〉 is not principal. Finally, is a normal domain (by Exercise 2.136 below), but not a UFD, since are two different factorizations of 6 into irreducible elements.)

2.132A (non-zero) ring A with a unique maximal ideal m is called a local ring. In that case, the field A/m is called the residue field of A.

Let A be ring and a prime ideal of A. Show that the localization is a local ring with the unique maximal ideal generated by elements , and the residue field is canonically isomorphic to the quotient field of the integral domain under the map .

2.133A ring A is called a discrete valuation ring (DVR) or a discrete valuation domain (DVD), if A is a local principal ideal domain. Let A be a DVR with maximal ideal m = 〈p〉. Prove the following assertions:
  1. A is a UFD.

  2. The only primes in A are the associates of p. [H]

  3. Every non-zero element of A can be written as upα, where u is a unit of A and .

  4. Every non-zero ideal of A is of the form 〈pα〉 for some .

  5. A has only one non-zero prime ideal (namely, m).

(Remark: The prime p of A is called a uniformizing parameter or a uniformizer for A and is unique up to multiplication by units.

The map taking upα ↦ α is called a discrete valuation of A and can be naturally extended to a group homomorphism by defining ν(a/b) := ν(a)–ν(b), where a, , b ≠ 0 and K = Q(A) is the quotient field of A. It is often convenient to define ν(0) := +∞. It follows that and .)

2.134
  1. Let A be a local Noetherian integral domain which is not a field. Assume further that the maximal ideal m ≠ 0 of A is the only non-zero prime ideal of A. Show that A is a DVR (that is, a PID) if and only if A is integrally closed.

  2. Let A be a Noetherian integral domain which is not a field. Prove that A is a Dedekind domain if and only if is a DVR for every non-zero prime ideal of A.

2.135
  1. Show that the only units of are ±1 and ±i.

  2. Show that the primes of are associates to the following:

    1. a prime integer ≡ 3 (mod 4),

    2. a + ib, a, , with a2 + b2 equal to 2 or a prime integer ≡ 1 (mod 4).

2.136
  1. Show that every quadratic number field K can be represented as for a square-free integer D ≠ 0, 1.

  2. Let for some square-free integer D ≠ 0, 1. Show that:

(In particular, the ring of integers of is the ring of Gaussian integers.)

2.137Let A be a Dedekind domain.
  1. Let q1 and q2 be two distinct non-zero prime ideals of A. Show that for any e1, we have . [H]

  2. Let be the prime factorization of a non-zero ideal of A with pairwise distinct primes qi and . Show that . [H]

2.138Let A be a Dedekind domain and a non-zero (integral) ideal of A. Show that:
  1. There exists a non-zero (integral) ideal of A such that is a principal ideal. [H]

  2. The number of ideals of A containing is finite.

  3. Every ideal of is principal.

2.139Let and , ei, , be the prime decompositions of two non-zero ideals , of a DD A. Define the gcd and lcm of and as

Show that and lcm. Conclude that . (Note that if A is a general ring, we only have .)

2.140Let K be a number field and .
  1. Let be an ideal of . Show that . In particular, every non-zero ideal of contains a non-zero integer. [H]

  2. Let be a non-zero prime ideal of . Prove that for some , where p is the unique rational prime contained in (Lemma 2.13).

2.141Let K be a number field, , , and . Show that:
  1. , if and only if N(α) = ±1.

  2. , if and only if f(0) = ±1, where is the minimal polynomial of α over .

  3. , if and only if |σ(α)| = 1 for every complex embedding σ of K.

2.142Let K be a number field. We say that K is norm-Euclidean, if for every α, , β ≠ 0, there exist q, such that α = qβ + r and | N(r)| < | N(β)|.
  1. Conclude that if K is norm-Euclidean, then is a Euclidean domain with the Euclidean degree function ν(α) := | N(α)|. (The converse of this is not true. For example, it is known that is not norm-Euclidean, but is a Euclidean domain.)

  2. Prove the following equivalent characterization of a norm-Euclidean number field: K is norm-Euclidean if and only if for every there exists such that | N(α – β)| < 1.

  3. Show that the following number fields are norm-Euclidean:

    , , , and .

  4. Show that is not norm-Euclidean. [H]

2.143In this exercise, one derives that the only (rational) integer solutions of Bachet’s equation

Equation 2.20


are x = 3, y = ±5.

  1. Show that Equation (2.20) has no solutions with x or y even. [H]

    Let (x, y) be a solution of Equation (2.20) with both x and y odd. Then x3 admits a factorization in as .

  2. Let . Show that and that is a UFD. Also the only units of are ±1.

  3. Show that gcd. [H]

  4. Because of unique factorization one can write for c, . Expand the cube and equate the real and imaginary parts to conclude that we must have y = ±5, so that x = 3.

**2.14. p-adic Numbers

Let us now study a different area of algebraic number theory, introduced by Kurt Hensel in an attempt to apply power series expansions in connection with numbers. While trying to explain the properties of (rational) integers mathematicians started embedding in bigger and bigger structures, richer and richer in properties. came in a natural attempt to form quotients, and for some time people believed that that is all about reality. Pythagoras was seemingly the first to locate and prove the irrationality of a number, namely, . It took humankind centuries for completing the picture of the real line. One possibility is to look as the completion of . A sequence an, , of rational numbers is called a Cauchy sequence if for every real ε > 0, there exists such that |aman| ≤ ε for all m, , m, nN. Every Cauchy sequence should converge to a limit and it is (and not ) where this happens. Seeing convergence of Cauchy sequences, people were not wholeheartedly happy, because the real polynomial X2 + 1 did not have—it continues not to have—roots in . So the next question that arose was that of algebraic closure. was invented and turned out to be a nice field which is both algebraically closed and complete.

Throughout the above business, we were led by the conventional notion of distance between points (that is, between numbers)—the so-called Archimedean distance or the absolute value. For every rational prime p, there exists a p-adic distance which leads to a ring strictly bigger than and containing . This is the ring of p-adic integers. The quotient field of is the field of p-adic numbers. is complete in the sense of convergence of Cauchy sequences (under the p-adic distance), but is not algebraically closed. We know anyway that a (unique) algebraic closure of exists. We have , that is, it was necessary and sufficient to add the imaginary quantity i to to get an algebraically closed field. Unfortunately in the case of the p-adic distance the closure is of infinite extension degree over . In addition, is not complete. An attempt to make complete gives an even bigger field Ωp and the story stops here, Ωp being both algebraically closed and complete. But Ωp is already a pretty huge field and very little is known about it.

In the rest of this section, we, without specific mention, denote by p an arbitrary rational prime.

2.14.1. The Arithmetic of p-adic Numbers

There are various ways in which p-adic integers can be defined. A simple way is to use infinite sequences.

Definition 2.111.

A p-adic integer is defined as an infinite sequence of elements with the property that an+1an (mod pn) for every . Each an, being an element of , can be represented as a (rational) integer unique modulo pn. Thus, if bn, , define another sequence of integers with bnan (mod pn) for every n, the p-adic integers (an) and (bn) are treated the same. In particular, if 0 ≤ bn < pn for every n, then (bn) is called the canonical representation of (an). The set of all p-adic integers is denoted by .[20] A sequence (an) of integers with an+1an (mod pn) for every n is called a p-coherent sequence.

[20] Well! We are now in a mess of notations. We have for every . In particular, for we have which is a field that we planned to denote also by . It is superfluous to have two notations for the same thing. Many authors, therefore, prefer to avoid the hat and call as . For them, our is and/or written explicitly. Let us stick to our old conventions and use hats to remove ambiguities.

See Exercise 2.144 for another way of defining p-adic integers. We now show that is a ring. Before doing that, we mention that the ring is canonically embedded in by the injective map , a ↦ (a).

Definition 2.112.

Let (an) and (bn) be two p-adic integers. Define:

(an) + (bn):=(an + bn).
(an) · (bn):=(an · bn).

One can easily check that these operations are well-defined, that is, independent of the choice of the representatives of an and bn. It also follows easily that these operations make a ring with additive identity and with multiplicative identity . The additive inverse of (an) is –(an) = (–an). Moreover is an injective ring homomorphism . In view of this, one often identifies the rational integer a with the p-adic integer . We will also do so, provided that we do not expect to face a danger of confusion. Also note that for the l-fold sum l(an) is the same as (l)(an) = (lan). Thus in this context the two interpretations of l remain perfectly consistent.

It turns out that is an integral domain. In order to see why, let us focus our attention on the units of . Let us plan to denote (the multiplicative group of units of ) by Up. The next result characterizes elements of Up.

Proposition 2.52.

For , the following conditions are equivalent:

  1. pan for all .

  2. pa1.

Proof

[(a)⇒(b)] Let (an)(bn) = (anbn) = 1 = (1) for some . Then for every we have anbn ≡ 1 (mod pn), that is, an is invertible modulo pn and hence modulo p as well, that is, pan.

[(b)⇒(c)] Obvious.

[(c)⇒(a)] Let us construct a p-coherent sequence bn, , of (rational) integers with anbn ≡ 1 (mod pn). This (bn) would be the desired inverse of (an) in . Since pa1 and ana1 (mod p), it follows that pan as well and, therefore, the congruence anx ≡ 1 (mod pn) has a unique solution modulo pn, namely (mod pn).

We also have an+1bn+1 ≡ 1 (mod pn), that is, anbn+1 ≡ 1 (mod pn), that is, .

Proposition 2.53.

Every can be written uniquely as x = pry for some and for some .

Proof

If pa1, take r := 0 and y := x. So assume that p|a1. Choose such that [an]pn = [0]pn for 1 ≤ nr, whereas [ar+1]pr+1 ≠ [0]pr+1. Such an r exists, since x ≠ 0 by hypothesis. For , we have ar+nar ≡ 0 (mod pr), that is, pr|ar+n, whereas ar+nar+1 ≢ 0 (mod pr+1), that is, pr+1ar+n, that is, vp(ar+n) = r. Define bn := ar+n/pr. Since ar+n+1ar+n (mod pr+n), division by pr gives bn+1bn (mod pn), that is, . Moreover, prbn = ar+nan (mod pn), that is, x = pry. Finally, since pb1, we have . This establishes the existence of a factorization x = pry. The uniqueness of this factorization is left to the reader as an easy exercise.

Proposition 2.54.

is an integral domain.

Proof

Let x1 and x2 be non-zero elements of . By Proposition 2.53, we can write x1 = pr1 y1 and x2 = pr2 y2 with r1, and y1, . Then (an) := x1x2 = pr1+r2 y1y2. Now and hence no bn is divisible by p. Therefore, ar1+r2+1 = pr1+r2 br1+r2+1 ≢ 0 (mod pr1+r2+1), that is, (an) = x1x2 ≠ 0.

Definition 2.113.

The quotient field of is called the field of p-adic numbers.

Proposition 2.55.

Every non-zero can be expressed uniquely as x = pry with and .

Proof

One can write x = a/b for some a, . Then a = psc and b = ptd for some s, , c, and so x = pst(c/d) with . The proof for the uniqueness is left to the reader.

The canonical inclusion naturally extends to the canonical inclusion . We can identify with the rational a/b and say that is contained in . Being a field of characteristic 0, contains an isomorphic copy of . The map gives this isomorphism explicitly. Note that the ring is strictly bigger than and the field is strictly bigger than the field (Exercise 2.147).

2.14.2. The p-adic Valuation

Proposition 2.55 leads to the notion of p-adic distance between pairs of points in . Let us start with some formal definitions.

Definition 2.114.

A metric on a set S is a map such that for every x, y, we have:

  1. Non-negative d(x, y) ≥ 0.

  2. Non-degeneracy d(x, y) = 0 if and only if x = y.

  3. Symmetry d(x, y) = d(y, x).

  4. Triangle inequality d(x, z) ≤ d(x, y) + d(y, z).

A set S together with a metric d is called a metric space (with metric d).

Definition 2.115.

A norm on a field K is a map such that for all x, we have:

  1. Non-negativex‖ ≥ 0.

  2. Non-degeneracyx‖ = 0 if and only if x = 0.

  3. Multiplicativityxy‖ = ‖x‖ ‖y‖.

  4. Triangle inequalityx + y‖ ≤ ‖x‖ + ‖y‖.

It is an easy check that for a norm ‖ ‖ on K the function , d(x, y) := ‖xy‖, defines a metric on K.

A norm ‖ ‖ on a field K is called non-Archimedean (or a finite valuation), if ‖x + y‖ ≤ max(‖x‖, ‖y‖) for all x, (a condition stronger than the triangle inequality). A norm which is not non-Archimedean is called Archimedean (or an infinite valuation).

Example 2.34.
  1. Setting defines a norm on any field K. This norm is called the trivial norm on K.

  2. The absolute value | | is an Archimedean norm on (or ). It is customary to denote this norm as | |. This norm induces the usual metric topology on (or ) which is at the heart of real analysis. In p-adic analysis, one investigates under the p-adic norms that we define now.

Definition 2.116.

The p-adic norm on is defined as:

Theorem 2.59.

The p-adic norm | |p is a non-Archimedean norm on .

Proof

Non-negative-ness, non-degeneracy and multiplicativity of | |p are immediate. For proving the triangle inequality, it is sufficient to prove the non-Archimedean condition. Take x, . If x = 0 or y = 0 or x + y = 0, we clearly have |x + y|p ≤ max(|x|p, |y|p). So assume that each of x, y and x + y is non-zero. Write x = pru and y = psv with r, and u, . Without loss of generality, we may assume that rs. Then, x + y = psz, where . Since x + y ≠ 0, we have z ≠ 0; so we can write z = ptw for some and . But then |x + y|p = p–(s+t)ps = max(pr, ps) = max(|x|p, |y|p).

Definition 2.117.

Two metrics d1 and d2 on a metric space S are called equivalent if a sequence (xn) from S is Cauchy with respect to d1 if and only if it is Cauchy with respect to d2. Two norms on a field are called equivalent if they induce equivalent metrics.

For every , the field is canonically embedded in and thus we have a notion of a p-adic distance on . We also have the usual Archimedean distance | | on . We now state an interesting result without a proof, which asserts that any distance on must be essentially the same as either the usual Archimedean distance or one of the p-adic distances.

Theorem 2.60. Ostrowski’s theorem

Every non-trivial norm on is equivalent to | |p for some .

The notions of sequences and series and their convergences can be readily extended to under the norm | |p. Since the p-adic distance assumes only the discrete values pr, , it is often customary to restrict ourselves only to these values while talking about the convergence criteria of sequences and series, that is, instead of an infinitesimally small real ε > 0 one can talk about an arbitrarily large with pM ≤ ε.

Definition 2.118.

Let x1, x2, . . . be a sequence of elements of . We say that this sequence converges to a limit , if given there exists such that |xnx|ppM for all nN. We write this as x = lim xn or as xnx.

Consider the partial sums for each . If there exists with sns, we say that the sum converges to s and write .

A sequence x1, x2, . . . of elements of is said to be a Cauchy sequence if for every , there exists an such that |xmxn|ppM for all m, nN.

Definition 2.119.

A field K is called complete under a norm ‖ ‖ if every sequence of elements of K, which is Cauchy under ‖ ‖, converges to an element in K.

For example, is complete under | |. We shortly demonstrate that is complete under | |p.

Consider a field K not (necessarily) complete under a norm ‖ ‖. Let C denote the set of all Cauchy sequences from K. Define addition and multiplication in C as (an) + (bn) := (an + bn) and (an)(bn) := (anbn). Under these operations C becomes a commutative ring with identity having a maximal ideal . The field is called the completion of K with respect to the norm ‖ ‖. K is canonically embedded in L via the map . The norm ‖ ‖ on K extends to elements of L as limn→∞an‖. L is a complete field under this extended norm. In fact, it is the smallest field containing K and complete under ‖ ‖.

is the completion of with respect to the Archimedean norm | |. On the other hand, turns out to be the completion of with respect to the p-adic norm | |p. Before proving this let us first prove that itself is a complete field under the p-adic norm. Let us start with a lemma.

Lemma 2.18.

A sequence (an) of p-adic numbers is a Cauchy sequence if and only if the sequence (an+1an) converges to 0.

Proof

[if] Take any . Since an+1an → 0 by hypothesis, there exists such that |an+1an|ppM for all nN. But then for all m, nN with m = n+k, , we have .

Thus (an) is a Cauchy sequence.

[only if] Take any . Since (an) is a Cauchy sequence by hypothesis, there exists such that |aman|ppM for all m, nN. In particular, |an+1an|ppM for all nN, that is, an+1an → 0.

Theorem 2.61.

The field is complete with respect to | |p.

Proof

Let (an) be a Cauchy sequence in . By Lemma 2.18, an+1an → 0. Therefore, there exists such that |an+1an|p ≤ 1 for all nN. For n = N + k, , we have

|an|p=|aN + k|p
 =|(aN + kaN + k –1) + · · · + (aN + 1aN) + aN|p
 max(|aN + kaN + k – 1|p,. . ., |aN + 1aN|p, |aN|p)
 max(1, |aN|p).

It then follows that |an|ppm for all , where satisfies pm = max(1, |a1|p, . . . , |aN |p). If m ≥ 0, then each (Exercise 2.148). Otherwise consider the sequence (pman) which is clearly Cauchy and in which each , since |pman|ppmpm = 1. Thus, without loss of generality, we may assume that the given sequence (an) itself is one of p-adic integers.

Let an = an,0+an,1p+an,2p2+· · · be the p-adic expansion of an (Exercise 2.145). Since (an) is Cauchy, for every there exists such that |aman|pp–(M+1) for all m, nNM: that is, an, i = am, i for 0 ≤ iM, m, nNM. Define xM := an, M for any nNM and . It then follows that anx.

Theorem 2.62.

is the completion of with respect to the norm | |p.

Proof

Let C denote the ring of Cauchy sequences from (under the p-adic norm), the maximal ideal of C consisting of sequences that converge to 0, and . We now show that .

If has the p-adic expansion a = arpr +· · ·+a–1p–1 +a0+a1p+a2p2 + · · · (Exercise 2.145), then αn := arpr + · · · + a–1p–1 + a0 + a1p + · · · + anpn, , define a sequence of elements of . We have |αna|pp–(n+1), that is, αna. Moreover, the sequence (αn) of rational numbers is Cauchy with respect to | |p, since for every we have |αm – αn|pp–(M+1) for all m, nM. Thus , , is a well-defined field homomorphism. Being a field homomorphism is injective.

What remains is to show that the map is surjective. Take any . Since (βn) is a Cauchy sequence, by Theorem 2.61 it converges to a point . We construct the sequence (αn) corresponding to a as described in the last paragraph. Then αna as well and hence using the triangle inequality (or the non-Archimedean condition) we have αn – βn = (αna) – (βna) → 0, that is, , that is, .

Corollary 2.29.

The p-adic series (with ) converges if and only if |an|p → 0.

Proof

The only if part is obvious. For the if part, take a sequence (an) of p-adic numbers with |an|p → 0. Define . Since an+1 = sn+1sn → 0 by hypothesis, Lemma 2.18 guarantees that (sn) is a Cauchy sequence, that is, (sn) converges in .

This is quite unlike the Archimedean norm | |. For example, with respect to this norm , whereas the series diverges.

2.14.3. Hensel’s Lemma

Let us conclude our short study of p-adic methods by proving an important theorem due to Hensel. This theorem talks about the solvability of polynomial equations f(X) = 0 for . Before proceeding further, let us introduce a notation. Recall that every has a unique p-adic expansion of the form a = a0 + a1p + a2p2 + · · · with 0 ≤ an < p (Exercises 2.144 and 2.145). If a0 = a1 = · · · = an–1 = 0, then a = anpn + an+1pn+1 + an +2pn+2 + · · · = pnb, where . Thus pn|a in . We denote this by saying that a ≡ 0 (mod pn). Notice that a ≡ 0 (mod pn) if and only if |a|ppn. We write ab (mod pn) for a, , if ab ≡ 0 (mod pn). Since pn can be viewed as the element of , this congruence notation conforms to that for a general PID. ( is a PID by Exercise 2.148.)

Since by our assumption any ring A comes with identity (that we denote by 1 = 1A), it makes sense to talk for every about an element n = nA in A, which is the n-fold sum of 1. More precisely:

Given any , one can define the formal derivative of f as . Properties of formal derivatives of polynomials are covered in Exercise 2.61.

Theorem 2.63. Hensel’s lemma

Let . Suppose that there exist and satisfying:

  1. |f0|p–(2M + 1) (that is, α0 is a solution of f(x)≡ 0 (mod p2M+1)), and

  2. |f′(α0)|p = pM (this is, f′ (α0) ≢ 0 (mod pM+1)).

Then there exists a unique such that f(α) = 0 and |α – α0|pp–(M+1) (that is, α ≡ α0 (mod pM+1)).

Proof

Let us inductively construct a sequence α0, α1, α2, · · · of p-adic integers with the properties that |fn)|pp–(2M+n+1) and |f′(αn)|p = pM for every . The given α0 provides the starting point (induction basis). For the inductive step, assume that n ≥ 1 and that α0, α1, . . . , αn–1 have been constructed with the desired properties. we now explain how to construct αn from αn–1. Put

αn := αn–1 + knpM + nfor some .

We want to find a suitable kn for which |fn)|pp–(2M+n+1). Taylor expansion gives fn) = fn–1) + knpM+nf′(αn–1) + cnp2(M+n) for some . Since by induction hypothesis p2M+n |fn–1) and pM |f′(αn – 1), we can write

Since pM+1f′(αn–1), the element and, therefore, there is a unique solution for kn of the congruence

This value of kn yields

fn) = p2M + n(bnp + cnpn) ≡ 0 (mod p2M+n+1)

for some . The Taylor expansion of f′ gives f′(αn) = f′(αn–1) + dnpM+n (for some ) which implies that f′(αn) ≡ f′(αn–1) (mod pM), that is, |f′(αn)|p = pM.

Since |αn – αn–1|pp–(M+n), it follows that αn – αn–1 → 0, that is, (αn) is a Cauchy sequence (under | |p). By the completeness of , we then have an such that αn → α. Similarly fn) – fn–1) → 0, that is, the sequence (fn)) is Cauchy and hence converges to f(α). Also |fn)|pp–(2M+n+1), that is, fn) → 0, that is, f(α) = 0. Finally, each αn ≡ α0 (mod pM+1), so that α ≡ α0 (mod pM+1). This establishes the existence of a desired .

For proving the uniqueness of α, let satisfy f(β) = 0 and |β – α0|pp–(M+1). By Taylor expansion, f(β) = f(α) + (β – α)f′(α) + (β – α)2c for some , that is, (β – α)(f′(α) + (β – α)c) = 0. Now β – α = (β – α0) – (α – α0) and so |β – α|p ≤ max(|β – α0|p, |α – α0|p) ≤ p–(M+1), whereas f′(αn) → f′(α), so that |f′(α)|p = pM. Therefore, f′(α)+(β –α)c ≢ 0 (mod pM+1) and, in particular, f′(α) + (β – α)c ≠ 0. Thus we must have β – α = 0.

Note that αn in the last proof satisfies the congruence

fn) ≡ 0 (mod p2M+n+1)

for each . We are given the solution α0 corresponding to n = 0. From this, we inductively construct the solutions α1, α2, . . . corresponding to n = 1, 2, . . . respectively. The process for computing αn from αn–1 as described in the proof of Hensel’s lemma is referred to as Hensel lifting. The given conditions ensure that this lifting is possible (and uniquely doable) for every , and in the limit n → ∞ we get a root of f. Since each kn is required modulo p, we can take . So α admits a p-adic expansion of the form α = α0 + k1pM+1 + k2pM+2 + k3pM+3 + · · ·.

The special case M = 0 for Hensel’s lemma is now singled out:

Corollary 2.30.

Let . Suppose that there exists an satisfying:

  1. |f0)|p < 1 (that is, α0 is a solution of f(x) ≡ 0 (mod p)), and

  2. |f′(α0)|p = 1 (that is, f′(α0) ≢ 0 (mod p), that is, α0 is a simple root of f modulo p).

Then there exists a unique such that f(α) = 0 and |α – α0|p < 1 (that is, α ≡ α0 (mod p)).

For this special case, we compute solutions αn of f(x) ≡ 0 (mod pn+1) inductively for n = 1, 2, 3, . . . , given a suitable solution α0 of this congruence for n = 0. The lifting formula is now:

Equation 2.21


Example 2.35.

is canonically embedded in and so is in . Thus it makes sense to carry out the lifting process for a polynomial and for some solution α0 of f(X) ≡ 0 (mod p) in . One solves Formula (2.21) in and obtains each . The limit α belongs to and is a solution of f(X) = 0 in .

For example, let p be an odd prime and . Let be a solution of X2a (mod p). Here f(X) = X2a, so that f′(X) = 2X, that is, f′(α0) = 2α0 ≢ 0 (mod p). Thus the conditions of Corollary 2.30 are satisfied and we get a unique square root of α in with α ≡ α0 (mod p). This α has a p-adic expansion of the form α = α0 + k1p + k2p2 + k3p3 + · · ·.

As a specific numerical example, take p = 7, a = 2 and α0 = 3. Using Formula (2.21), we compute k1 = 1, α1 = 10, k2 = 2, α2 = 108, k3 = 6, α3 = 2166, and so on. Thus a square root of 2 in is 3 + 1 × 7 + 2 × 72 + 6 × 73 + · · ·. The other square root of α in can be obtained by starting with α0 = 4.

Exercise Set 2.14

2.144
  1. Establish that any p-adic integer (an) can be uniquely described as a sequence of integers xn satisfying 0 ≤ xn < p for every and anx0 + x1p + · · · + xn–1pn–1 (mod pn) for every . In this case, the p-adic integer (an) is written as the infinite series

    (an) = x0 + x1p + x2p2 + · · ·.

    One calls the above series the p-adic expansion of (an). Note that the sum in the above series is not to be treated as one of integers. However, for the expansion of a to the base p is the same as the p-adic expansion of a (more correctly of ). In other words, if the p-adic expansion of (an) is terminating, that is, xN = xN+1 = xN+2 = · · · = 0 for some N, then (an) can be identified with the rational integer x0 + x1p + · · · + xN–1pN–1. A non-terminating p-adic series, on the other hand, diverges under the Archimedean norm, but converges under the p-adic norm and corresponds to an element of not in . The rational integer –1, for example, has the infinite p-adic expansion (p – 1) + (p – 1)p + (p – 1)p2 + · · ·. The sum telescopes and in the limit n → ∞ converges (under the p-adic norm) to limn→∞ pn – 1 = –1.

  2. Let . Write the p-adic expansion for –a. [H]

  3. Given p-adic integers a := x0 + x1p + x2p2 + · · · and b := y0 + y1p + y2p2 + · · · , find the p-adic integers c := z0 + z1p + z2p2 + · · · and d := w0 + w1p + w2p2 + · · · , such that c = a + b and d = ab. (Express each zn and wn explicitly in terms of xn’s and yn’s.)

2.145In view of Exercise 2.144, every admits a unique expansion of the form x = x0 + x1p + x2p2 + · · · , where each . This notion of p-adic expansion can be extended to the elements of .
  1. Show that for , there exist unique and unique integers xr, xr+1, . . . , x–1, x0, x1, . . . , each in {0, 1, . . . , p – 1}, such that x can be written as:

    x = xrpr + xr+1pr+1 + · · · + x–1p–1 + x0 + x1p + x2p2 + · · ·.

  2. Describe how to compute the p-adic expansions of x + y and xy given those for x, . Also of x/y provided that y ≠ 0.

  3. What is |x|p for ?

  4. What is|x|p for with xr ≠ 0.

2.146Let p be an odd prime and with . From elementary number theory we know that the congruence x2a (mod pn) has two solutions for every . Let x1 be a solution of x2a (mod p). We know that a solution xn of x2a (mod pn) lifts uniquely to a solution xn+1 of x2a (mod pn+1). Thus we can inductively compute a sequence x1, x2, x3, · · · of integers. Show that (xn) is a p-adic integer and that (xn)2 = (a).
2.147
  1. Show that the ring contains rationals of the form a/b, a, , pb. This implies that .

  2. Take a := 17 for p = 2, a := 7 for p = 3 and a := p + 1 for p > 3. Show that there exists with x2 = a in . Show also that such an x does not belong to . Thus .

  3. Show that . Thus .

2.148Prove the following assertions:
  1. .

  2. .

  3. Every non-zero ideal of is of the form for some .

  4. The ideals of Part (c) satisfy the infinite strictly descending chain .

  5. is a local domain with the maximal ideal .

  6. The ideal of Part (c) is the principal ideal of generated by pr, and . In particular, is a local PID, that is, a discrete valuation domain (Exercise 2.133), with the residue field .

2.149Compute the p-adic expansion of 1/3 in and of –2/5 in .
2.150Show that is dense in under the p-adic norm | |p, that is, show that given any and real ε > 0, there exists with |xa|p < ε. Show also that is dense in .
2.151Prove the following assertions that establish that is the closure of in under | |p.
  1. Every sequence (an) of rational integers, Cauchy under | |p, converges in .

  2. If a sequence (an) of rational numbers, Cauchy under | |p, converges to , then there exists a sequence (bn) of rational integers, Cauchy under | |p, that converges to x.

2.152Show that:
  1. The series converges in .

  2. The series converges in .

  3. in . [H]

  4. The series does not converge in .

  5. If and |a|p < 1, then .

2.153Prove that for any non-zero . [H]
2.154Prove that for any the sequence (apn) converges in . [H]
2.155Let p, , pq. Show that the fields and are not isomorphic.
2.156Let a be an integer congruent to 1 modulo 8. Show that there exists an such that α2 = a and .
2.157Compute with α2 + α + 223 = 0 and α ≡ 4 (mod 243).
2.158Let p be an odd prime and . Show that the polynomial X2a has exactly root in .
2.159Show that the polynomial X2p is irreducible in .
2.160

Teichmüller representative Let . Show that there exists a unique such that αp = α and α ≡ a (mod p).

2.161Show that the algebraic closure of is of infinite extension degree over . [H]

2.15. Statistical Methods

Many attacks on cryptosystems involve statistical analysis of ciphertexts and also of data collected from the victim’s machine during one or more private-key operations. For a proper understanding of these analysis techniques, one requires some knowledge of statistics and random variables. In this section, we provide a quick overview of some statistical gadgets. We make the assumption that the reader is already familiar with the elementary notion of probability. We denote the probability of an event E by Pr(E).

2.15.1. Random Variables and Their Probability Distributions

An experiment whose outcome is random is referred to as a random experiment. The set of all possible outcomes of a random experiment is called the sample space of the experiment. For example, the outcomes of tossing a coin can be mapped to the set {H, T} with H and T standing respectively for head and tail. It is convenient to assign numerical values to the outcomes of a random experiment. Identifying head with 0 and tail with 1, one can view coin tossing as a random experiment with sample space {0, 1}. Some other random experiments include throwing a die (with sample space {1, 2, 3, 4, 5, 6}), the life of an electric bulb (with sample space , the set of all non-negative real numbers), and so on. Unless otherwise specified, we henceforth assume that sample spaces are subsets of .

A random variable is a variable which can assume (all and only) the values from a (given) sample space.

A discrete random variable can assume only countably many values, that is, the sample space SX of a discrete random variable X either is finite or has a bijection with , that is, we can enumerate the elements of SX as x1, x2, x3, . . ..

The probability distribution function or the probability mass function

fX : SX → [0, 1]

of a discrete random variable X assigns to each x in the sample space SX of X the probability of the occurrence of the value x in a random experiment.[21] We have

[21] [a, b] is the closed interval consisting of all real numbers u satisfying aub. Similarly, the open interval (a, b) is the set of all real values u satisfying a < u < b. In order to make a distinction between the open interval (a, b) and the ordered pair (a, b), many—mostly Europeans—use the notation ]a, b[ for denoting open intervals.

A continuous random variable assumes uncountable number of values, that is, the sample space SX of a continuous random variable X cannot be in bijective correspondence with a subset of . Typically SX is an interval [a, b] or (a, b) with –∞ ≤ a < b ≤ +∞.

One does not assign individual probabilities Pr(X = x) to a value assumed by a continuous random variable X.[22] The probabilistic behaviour of X is in this case described by the probability density function

[22] More correctly, Pr(X = x) = 0 for each .

with the implication that the probability that X occurs in the interval [c, d] (or (c, d)) is given by the integral

that is, by the area between the x-axis, the curve fX(x) and the vertical lines x = c and x = d. We have

It is sometimes useful to set fX(x) :=0 for , so that fX is defined on the entire real line .

The cumulative probability distribution of a random variable X (discrete or continuous) is the function FX (x) := Pr(Xx) for all . If X is continuous, we have

which implies that

2.15.2. Operations on Random Variables

Let X and Y be discrete random variables. The joint probability distribution of X, Y refers to a random variable Z with SZ = SX × SY. For z = (x, y), the probability of Z = z is denoted by fZ(z) = Pr(Z = z) = Pr(X = x, Y = y). The probability Pr(X = x, Y = y) stands for the probability that X = x and Y = y. The random variables X and Y are called independent, if

Pr(X = x, Y = y) = Pr(X = x) Pr(Y = y)

for all x, y.

Example 2.36.

Suppose that we have an urn containing three identical balls with labels 1, 2, 3. We draw two balls randomly from the urn. Let us denote the outcome of the first drawing by X and that of the second drawing by Y. We consider the joint distribution X, Y of the two outcomes in the two following cases:

  1. The balls are drawn with replacement, that is, after the first ball is drawn, it is returned back to the urn (and the urn is shaken well), before the next ball is drawn. The joint probability distribution is now as follows:

    In this case, the outcome of the second drawing is not influenced by the outcome of the first drawing; that is, X and Y are independent, and we have , as expected.

  2. The balls are drawn without replacement, that is, the ball obtained by the first drawing is not returned to the urn, before the second ball is drawn. In this case, the outcome of the second drawing is influenced by that of the first drawing in the sense that the same ball cannot be drawn on both occasions. Thus, X and Y are now dependent. This is revealed by the following joint probability distribution:

    xyPr(X = x, Y = y)
    110
    121/6
    131/6
    211/6
    220
    231/6
    311/6
    321/6
    330

For continuous random variables X and Y, the joint distribution is defined by the probability density function fX,Y (x, y) and the cumulative distribution is obtained by the double integral

X and Y are independent, if fX,Y (x, y) = fX(x)fY (y) for all x, y. In this case, we also have FX,Y (c, d) = FX(c)FY (d) for all c, d.

Now, we define arithmetic operations on random variables. First, let X and Y be discrete random variables. The sum X + Y is defined to be a random variable U which assumes the values u = x + y for and with probability

The product XY of X and Y is defined to be a random variable V which assumes the values v = xy for and with probability

For , the random variable W = αX assumes the values w = αx for with probability

fW(w) = Pr(W = αx) = Pr(X = x) = fX(x).

Example 2.37.

Let us consider the random variables X and Y of Example 2.36. For the sake of brevity, we denote Pr(X = x, Y = y) by Pxy. The distributions of U = X + Y in the two cases are as follows:

  1. Drawing with replacement:

    Pr(U = 2)=P11 = 1/9
    Pr(U = 3)=P12 +P21 = 2/9
    Pr(U = 4)=P13 +P22 + P31 = 1/3
    Pr(U = 5)=P23 +P32 = 2/9
    Pr(U = 6)=P33 = 1/9

  2. Drawing without replacement:

    Pr(U = 3)=P12 +P21= 1/3
    Pr(U = 4)=P13 +P31= 1/3
    Pr(U = 5)=P23 +P32= 1/3

Now, let us consider continuous random variables X and Y. In this case, it is easier to define first the cumulative density functions of U = X + Y, V = XY and W = αX and then the probability density functions by taking derivatives:

One can easily generalize sums and products to an arbitrary finite number of random variables. More generally, if X1, . . . , Xn are random variables and , one can talk about the probability distribution or density function of the random variable g(X1, . . . , Xn). (See Exercise 2.163.)

Now, we introduce the important concept of conditional probability. Let X and Y be two random variables. To start with, suppose that they are discrete. We denote by f(x, y) = Pr(X = x, Y = y) the joint probability distribution function of X, Y. For with Pr(Y = y) > 0, we define the conditional probability of X = x given Y = y as:

For a fixed , the probabilities fX|y(x), , constitute the probability distribution function of the random variable X|y (X given Y = y). If X and Y are independent, f(x, y) = fX(x)fY (y) and so fX|y(x) = fX(x) for all , that is, the random variables X and X|y have the same probability distribution. This is expected, because in this case the probability of X = x does not depend on whatever value y the variable Y takes.

If X and Y are continuous random variables with joint density f(x, y) and , the conditional probability density function of X|y (X given Y = y) is defined by

Again if X and Y are independent, we have fX|y(x) = fX(x) for all x, y.

For a fixed , one can likewise define the conditional probabilities fY|x (y) := f(x, y)/fX (x) for all .

Let X and Y be discrete random variables with joint distribution f(x, y). Also let Γ ⊆ SX and Δ ⊆ SY. One defines the probability fX(Γ) as:

The joint probability f(Γ, Δ), is defined as:

If Γ = {x} is a singleton, we prefer to write f(x, Δ) instead of f({x}, Δ). Similarly, f(Γ, y) stands for f (Γ,{y}). We also define the conditional distributions:

We abbreviate fX (Γ) as Pr(Γ|Δ) and fY (Δ) as Pr(Δ|Γ).

Theorem 2.64. Bayes rule

Let X, Y be discrete random variables and Δ ⊆ SY with fY (Δ) > 0. Also let Γ1,..., Γn form a partition of SX with fXi) > 0 for all i = 1, . . . , n. Then we have:

that is, in terms of probability:

Proof

Pr(Γi, Δ) = Pr(Δ|Γi) Pr(Γi) = Pr(Γi|Δ) Pr(Δ). So it is sufficient to show that Pr(Δ) equals the sum in the denominator. The event Δ is the union of the pairwise disjoint events (Γj, Δ), j = 1,..., n, and so .

The Bayes rule relates the a priori probabilities Pr(Γj) and Pr(Δ|Γj) to the a posteriori probabilities Pr(Γi|Δ). The following example demonstrates this terminology.

Example 2.38.

Consider the random experiment of Example 2.36(2). Take Γj := {j} for and Δ := {2, 3}. We have the following a priori probabilities:

Pr(Γj)=Probability of getting ball j in the first draw = 1/3,
Pr(Δ|Γ1)=Probability of getting the second or the third ball in the second draw, given that the first ball is obtained in the first draw = 1,
Pr(Δ|Γ2)=Probability of getting the second or the third ball in the second draw, given that the second ball is obtained in the first draw = 1/2,
Pr(Δ|Γ3)=Probability of getting the second or the third ball in the second draw, given that the third ball is obtained in the first draw = 1/2.

The a posteriori probability Pr(Γ1|Δ) that the first ball was obtained in the first draw given that the ball obtained in the second draw is the second or the third one is calculated using the Bayes rule as:

One can similarly calculate . This is expected, since the only events (x, y) consistent with are the four equiprobable possibilities (1, 2), (1, 3), (2, 3) and (3, 2).

2.15.3. Expectation, Variance and Correlation

Let X be a random variable. The expectation E(X) of X is defined as follows:

E(X) is also called the (arithmetic) mean or average of X. One uses the alternative symbols μX and to denote E(X). More generally, let X1, . . . , Xn be n random variables with joint probability distribution/density function f(x1, . . . , xn). Also let . We define the following expectations:

X is discrete:

X is continuous:

Let g(X) and h(Y) be real polynomial functions of the random variables X and Y and let . Then

E(g(X) + h(Y))=E(g(X)) + E(h(Y)),
E(g(X)h(Y))=E(g(X)) E(h(Y)) if X and Y are independent,
E(αg(X))=αE(g(X)).

Let us derive the sum and product formulas for discrete variables X and Y.

If X and Y are independent, then

The variance Var(X) of a random variable X is defined as

Var (X) := E[(X – E(X))2].

From the observation that E[(X – E(X))2] = E[X2 – 2 E(X)X + [E(X)]2] = E(X2) – 2 E(X) E(X) + [E(X)]2, we derive the computational formula:

Var (X) = E[X2] – [E(X)]2.

Var(X) is a measure of how the values of X are dispersed about the mean E(X) and is always a non-negative quantity. The (non-negative) square root of Var(X) is called the standard deviation σX of X:

The following formulas can be easily verified:

Var(X + α)=Var(X).
Var(αX)=α2 Var(X).
Var(X + Y)=Var(X) + Var(Y) + 2 Cov(X, Y),

where and where the covariance Cov(X, Y) of X and Y is defined as:

Cov(X, Y) := E[(X – E(X))(Y – E(Y))] = E(XY) – E(X) E(Y).

Normalized covariance is a measure of correlation between the two random variables X and Y. More precisely, the correlation coefficient ρX,Y is defined as:

If X and Y are independent, E(XY) = E(X) E(Y) so that Cov(X, Y) = 0 and so ρX,Y = 0. The converse of this is, however, not true, that is, ρX,Y = 0 does not necessarily imply that X and Y are independent. ρX,Y is a real value in the interval [–1, 1] and is a measure of linear relationship between X and Y. If larger (resp. smaller) values of X are (in general) associated with larger (resp. smaller) values of Y, then ρX,Y is positive. On the other hand, if larger (resp. smaller) values of X are (in general) associated with smaller (resp. larger) values of Y, then ρX,Y is negative.

Example 2.39.

Once again consider the drawing of two balls from an urn containing three balls labelled {1, 2, 3} (Examples 2.36, 2.37 and 2.38). Look at the second case (drawing without replacement). We use the shorthand notation Pxy for Pr(X = x, Y = y). The individual probability distributions of X and Y can be obtained from the joint distribution as follows:

Pr(X = 1)= P11 + P12 + P13= 0 + (1/6) + (1/6)= 1/3
Pr(X = 2)= P21 + P22 + P23= (1/6) + 0 + (1/6)= 1/3
Pr(X = 3)= P31 + P32 + P33= (1/6) + (1/6) + 0= 1/3
    
Pr(Y = 1)= P11 + P21 + P31= 0 + (1/6) + (1/6)= 1/3
Pr(Y = 2)= P12 + P22 + P32= (1/6) + 0 + (1/6)= 1/3
Pr(Y = 3)= P13 + P23 + P33= (1/6) + (1/6) + 0= 1/3

Thus E(X) = 1 × (1/3) + 2 × (1/3) + 3 × (1/3) = 2. Similarly, E(Y) = 2. Therefore, E(X + Y) = E(X) + E(Y) = 4. This can also be verified by direct calculations: E(X + Y) = 3 × (1/3) + 4 × (1/3) + 5 × (1/3) = 4.

E(X2) = E(Y2) = 12 × (1/3) + 22 × (1/3) + 32 × (1/3) = 14/3 and Var(X) = Var(Y) = (14/3) – 22 = 2/3. The probability distribution for XY is

E(XY = 2)=P12 + P21 = 1/3
E(XY = 3)=P13 + P31 = 1/3
E(XY = 6)=P23 + P32 = 1/3,

so that E(XY) = 2 × (1/3) + 3 × (1/3) + 6 × (1/3) = 11/3. Therefore, Cov(XY) = E(XY) – E(X) E(Y) = (11/3) – 2 × 2 = –1/3, that is,

The negative correlation between X and Y is expected. If X = 1 (small), Y takes bigger values (2, 3). On the other hand, if X = 3 (large), Y assumes smaller values (1, 2). Of course, the correlation is not perfect, since for X = 2 the values of Y can be smaller (1) or larger (3). So, we should feel happy to see a not-so-negative correlation of –1/2 between X and Y.

2.15.4. Some Famous Probability Distributions

Some probability distributions that occur frequently in statistical theory and in practice are described now. Some other useful probability distributions are considered in the Exercises 2.169, 2.170 and 2.171.

Uniform distribution

A discrete uniform random variable U has sample space SU := {x1, . . . , xn} and probability distribution

A continuous uniform random variable U has sample space SU and probability density function

where A > 0 is the size[23] of SU. For example, if SU is the real interval [a, b] for a < b, we have

[23] If , “size” means length. If or , “size” refers to area or volume respectively. We assume that the size of SU is “measurable”.

In this case, we have

E(U) = (a + b)/2andVar(U) = (ba)2/12.

Uniform random variables often occur naturally. For example, if we throw an unbiased die, the six possible outcomes (1 through 6) are equally likely, that is, each possible outcome has the probability 1/6. Similarly, if a real number is chosen randomly in the interval [0, 1], we have a continuous uniform random variable. The built-in C library call rand() (pretends to) return an integer between 0 and 231 – 1, each with equal probability (namely, 2–31).

Bernoulli distribution

The Bernoulli random variable B = B(n, p) is a discrete random variable characterized by two parameters and , where p stands for the probability of a certain event E and n represents the number of (independent) trials. It is assumed that the probability of E remains constant (namely, p) in each of the n trials. The sample space SB = {0, 1, . . . , n} comprises the (exact) numbers of occurrences of E in the n trials. B has the probability distribution

as follows from simple combinatorial arguments. The mean and variance of B are:

E(B) = npandVar(B) = np(1 – p).

The Bernoulli distribution is also called the binomial distribution.

Normal distribution

The normal random variable or the Gaussian random variable N = N (μ, σ2) is a continuous random variable characterized by two real parameters μ and σ with σ > 0. The density function of N is

The cumulative distribution for N can be expressed in terms of the error function erf():

The error function does not have a known closed-form expression. Figure 2.3 shows the curves for fN (x) and FN (x) for the parameter values μ = 0 and σ = 1 (in this case, N is called the standard normal variable).

Figure 2.3. Standard normal distribution


Some statistical properties of N are:

E(N) = μandVar(N) = σ2.

The curve fN (x) is symmetric about x = μ. Most of the area under the curve is concentrated in the region μ – 3σ ≤ x ≤ μ + 3σ. More precisely:

Pr(μ – σ ≤ X ≤ μ + σ)0.68,
Pr(μ – 2σ ≤ X ≤ μ + 2σ)0.95,
Pr(μ – 3σ ≤ X ≤ μ + 3σ)0.997.

Many distributions occurring in practice (and in nature) approximately follow normal distributions. For example, the height of (adult) people in a given community is roughly normally distributed. Of course, the height of a person cannot be negative, whereas a normal random variable may assume negative values. But, in practice, the probability that such an approximating normal variable assumes a negative value is typically negligibly low.

2.15.5. Sample Mean, Variation and Correlation

In practice, we often do not know a priori the probability distribution or density function of a random variable X. In some cases, we do not have the complete data, whereas in some other cases we need an infinite amount of data to obtain the actual probability distribution of a random variable. For example, let X represent the life of an electric bulb manufactured by a given company in the last ten years. Even though there are only finitely many such bulbs and even if we assume that it is possible to trace the working of every such bulb, we have to wait until all these bulbs burn out, before we know the actual distribution of X. That is certainly impractical. On the contrary, if we have data on the life-times of some sample bulbs, we can approximate the properties of X by those of the samples.

Suppose that S := (x1, x2, . . . , xn) is a sample of size n. We assume that all xi are real numbers. We define the following quantities for S:

Here is the mean of the collection .

If T := (y1, y2, . . . , ym) is another sample (of real numbers), the (linear) relationship between S and T is measured by the following quantities:

Here is the mean of the collection ST := (xiyj | i = 1, . . . , n, j = 1, . . . , m).

An important property of the normal distribution is the following:

Theorem 2.65. Central limit theorem

Let X be any random variable with mean μ and variance σ2 and let . The mean of a random sample S of size n chosen according to the distribution of X approximately follows the normal distribution N (μ, σ2/n). The larger the sample size n is, the better this approximation is.

Exercise Set 2.15

2.162An urn contains n1 red balls and n2 black balls. We draw k balls sequentially and randomly from the urn, where 1 ≤ kn1 + n2.
  1. If the balls are drawn with replacement, what is the probability that the k-th ball drawn from the urn is red?

  2. If the balls are drawn without replacement, what is the probability that the k-th ball drawn from the urn is red?

2.163Let X and Y be the random variables of Example 2.36. For each of the two cases, calculate the probability distribution functions, expectations and variances of the following random variables:
  1. XY

  2. 2X + 3Y

  3. X2

  4. X2 + 2XY + Y2

  5. (X + Y)2

2.164Let X and Y be continuous random variables, g(X) and h(Y) non-constant real polynomials and α, β, . Prove that:
E(g(X) + h(Y))=E(g(X)) + E(h(Y)).
E(g(X)h(Y))=E(g(X)) E(h(Y)), if X and Y are independent.
E(αg(X))=αE(g(X)).
Var(αX + βY + γ)=α2 Var(X) + β2 Var(Y).

2.165Let X be a random variable and Y := αX + β for some α, . What is ρX,Y ?
2.166
  1. Let X and Y be discrete random variables with joint probability distribution function f(x, y). Show that the probability distributions of X and Y can be obtained as

  2. If X and Y are continuous random variables with joint density function f(x, y), show that the density functions of X and Y are given by

    The functions fX and fY are called the marginal probability distribution (or density function) of X and Y respectively.

2.167Let X and Y be continuous random variables whose joint distribution is the uniform distribution in the triangle 0 ≤ XY ≤ 1.
  1. Compute the marginal distributions fX and fY.

  2. Compute E(X), E(Y), Var(X), Var(Y), Cov(X, Y) and ρX,Y.

2.168Let X, Y, Z be random variables. Show that:
Cov(X, Y)=Cov(Y, X).
ρX,Y=ρY,X.
Cov(X, X)=Var(X).
Cov(X, Y + Z)=Cov(X, Y) + Cov(X, Z).
Cov(X, X + Y)=Var(X) + Cov(X, Y).
Cov(X, X + Y)=Var(X) if X and Y are independent.

2.169

Geometric distribution Assume that in each trial of an experiment, an event E has a constant probability p of occurrence. Let G = G(p) denote the random variable with and with fG(x) equal to the probability that E occurs the first time during the x-th trial (that is, after exactly x – 1 failures). Show that:

What if p = 0?
2.170

Poisson distribution Let P = P (λ) be the discrete random variable with and with , where λ is a positive real constant. Show that E(P) = Var(P) = λ.

2.171Exponential distribution
  1. Let X = X(λ) be the continuous random variable with density

    where λ is a positive real constant. Show that:

  2. A random variable Y with is said to be memoryless, if

    Pr(Y > s + t | Y > s) = Pr(Y > t) for all s, .

Show that the exponential variable X of Part (a) is memoryless.

2.172

The birthday paradox Let S be a finite set of cardinality n.

  1. Show that the probability that k < n elements, drawn at random form S (with replacement), are (pairwise) distinct is

  2. Use the inequality 1 – xex for any real number x to show that .

  3. Deduce that p ≤ 1/2, if , and that p ≤ 0.136 for .

    (The birthday paradox states that if only 23 people are chosen at random, there is a chance as high as 50 per cent that at least two of them have the same birthday.)

Chapter Summary

This chapter provides the foundations of public-key cryptology. The long compilation of mathematical concepts presented in the chapter would be indispensable for understanding the topics that follow in the next chapters.

This chapter begins with the basic concepts of sets, functions and relations. We also present the fundamental axioms of mathematics. Although the curricula of plus-two courses of many examination boards do include these topics, we planned to have a discussion on them in order to make our treatment self-sufficient.

Next comes a study of groups which are sets with binary operations satisfying some nice properties (associativity, identity, inverse and optionally commutativity). Groups are extremely important for cryptology. In particular, all discrete-log-based cryptosystems use suitable groups. Subgroups, cosets and formation of quotient groups constitute a prototypical feature that illustrates the basic paradigm of modern algebra. Secure cryptographic algorithms on groups rely on the availability of elements of large orders: for example, generators of big cyclic groups. We study these topics at length. Finally, we present Sylow’s theorem. For us, this theorem has only theoretical significance; it is used for proving some other theorems.

A set with a single operation (like a group) is often too restrictive. Many mathematical structures we are familiar with (like integers, polynomials) are endowed with two basic operations addition and multiplication. A set with two such (compatible) operations is called a ring. A study of rings, fields, ideals and quotient rings is essential in algebra (and so in cryptography too). Three important types of rings, namely unique factorization domains, principal ideal domains and Euclidean domains, are also discussed. Euclidean division is an important property of integers and polynomials, and is useful from a computational perspective.

Then, as a specific example, we study the properties of , the ring of integers. We concentrate mostly on elementary properties of integers like divisibility, congruence, Chinese remainder theorem, Fermat’s and Euler’s theorems, quadratic residues and the law of quadratic reciprocity. We finally discuss some assorted topics from analytic number theory. In cryptography, we require many big randomly generated primes. The prime number theorem guarantees that there is essentially an abundant source of primes. Smooth integers (that is, integers having only small prime divisors) are useful for modern algorithms that compute factorization and discrete logarithms. We present an estimate on the density of smooth integers. The last topic we study is the Riemann hypothesis and its generalizations. This yet unproven hypothesis has a bearing on the running times of many number-theoretic algorithms relevant to cryptology.

The next example is the ring of polynomials over a ring. Polynomials over a field admit Euclidean division and consequently unique factorization. Irreducible polynomials are useful for constructing field extensions. Extension fields of characteristic 2 are quite frequently used in cryptographic systems.

We subsequently study the theory of vector spaces. Linear transformations are appropriate maps between vector spaces and necessitate the theory of matrices. Matrix algebra is widely useful in cryptology as it is in any other branch of algorithmic computer science. Algorithms to solve linear systems over rings and fields constitute a basic computational tool. A study of modules and algebras at the end of this section is mostly theoretical and can be avoided if the reader is willing to accept some theorems without proofs.

In the next section, we discuss the theory of field extensions. As mentioned earlier, cryptography relies heavily on extension fields of characteristic 2. Some related topics include splitting fields and algebraic closure of fields. At the end of this section, we have a short theoretical treatment of Galois theory.

Many popular cryptosystems are based on the multiplicative groups of finite fields. We study these fields as the next topic. Polynomials over finite fields are extremely useful for the construction and representation of finite fields. At the end of this section, we discuss several ways in which (elements of) finite fields can be represented in a computer’s memory. This study expedites the design, analysis and efficient implementation of finite-field arithmetic.

Elliptic- and hyperelliptic-curve cryptography having gained popularity in recent years, one needs to study the theory of plane algebraic curves. This is what we do in the next three sections. To start with, we define affine and projective spaces and curves. Going from the affine space to the projective space is necessitated by a systematic (algebraic) inclusion of points at infinity on a plane curve. We also discuss the theory of divisors and the Jacobian on plane curves. For elliptic curves, the Jacobian can be replaced by the equivalent group described in terms of the chord and tangent rule. For hyperelliptic curves, on the other hand, we have little option other than understanding the Jacobian itself.

Two kinds of elliptic curves that must be avoided in cryptography are supersingular curves and anomalous curves. The elliptic curve group (over a finite field) is the basic set used in elliptic curve cryptosystems. The orders (cardinality) of these groups are given by Hasse’s theorem. The structure theorem establishes that an elliptic curve group (over a finite field) is not necessarily cyclic, but has a rank of at most two.

We then study Jacobians of hyperelliptic curves over finite fields. This study supplements the theory of divisors on general curves. Reduced and semi-reduced divisors are expedient for the representation of the elements in the Jacobian of a hyperelliptic curve.

Many popular cryptosystems (including RSA) derive their security (presumably) from the intractability of the integer factorization problem. The best algorithm known till date for factoring integers is the number-field sieve method. An understanding of this algorithm requires the knowledge of number fields and number rings. We devote a section to the study of these mathematical objects. We start with some necessary commutative algebra including localization, integral dependence and Noetherian rings. Next, we deal with Dedekind domains. All number rings are Dedekind domains in which ideals admit unique factorization. We also discuss the factorization of ideals in number rings generated by rational primes and the structure of units in number rings (Dirichlet’s unit theorem).

The next section is a gentle introduction to the theory of p-adic numbers. These numbers are useful, for example, for designing attacks against elliptic curve cryptosystems.

In the last section, we summarize some statistical tools. Under the assumption that the reader is already familiar with the elementary notion of probability, we discuss properties of random variables and of some common probability distributions (including uniform and normal distributions). The birthday paradox described in an exercise is often useful in cryptographic context (for example, for collision attacks on hash functions).

That is the end of this chapter. The compilation may initially look long and boring, perhaps intimidating too. The unfortunate reality is that public-key cryptology is mathematical, and it is arguably better to treat it in the formal way. If the reader is not comfortable with mathematics (in general), cryptology is perhaps not her cup of tea. An elementary approach to cryptology is what many other books have adopted. This book aims at being different in that respect. It is up to the reader to decide to what level of details she is willing to study cryptography.

Suggestions for Further Reading

Knowledge is of two kinds. We know a subject ourselves, or we know where we can find information upon it.

—Samuel Johnson

In this chapter, we have summarized the basic mathematical facts that cryptologists are expected to know in order to have a decent understanding of the present-day public-key technology. Our discussion has been often more intuitive than mathematically complete. A reader willing to gain further insight in these areas should look at materials written specifically to deal with the specialized topics. Here are our (biased) suggestions.

There are numerous textbooks on introductory algebra. The books by Herstein [125], Fraleigh [96], Dummit and Foote [81], Hungerford [133] and Adkins and Weintraub [1] are some of our favourites. The algebra of commutative rings with identity (rings by our definition) is called commutative algebra and is the basic for learning advanced areas of mathematics like algebraic geometry and algebraic number theory. A serious study of these disciplines demands more in-depth knowledge of commutative algebra than we have presented in Section 2.13.1. Atiyah and MacDonald’s book [14] is a de facto standard on commutative algebra. Hoffman and Kunze’s book [127] is a good reference for linear algebra and matrix algebra.

Elementary number theory deals with the theory of (natural) numbers without using sophisticated techniques from complex analysis and algebra. Zuckerman et al. [316] can be consulted for a lucid introduction to this subject. The books by Burton [42] and Mollin [207] are good alternatives.

Thorough mathematical treatise on finite fields can be found in the books by Lidl and Niederreiter [179, 180] of which the second also deals with computational issues. Other books of computational flavour include those by Menezes [191] and by Shparlinski [274]. Also see the paper [273] by Shparlinski.

The use of elliptic curves in cryptography has been proposed by Koblitz [150] and Miller [205], and that of hyperelliptic curves by Koblitz [151]. A fair mathematical understanding of elliptic curves banks on the knowledge of commutative algebra (see above) and algebraic geometry. Hartshorne’s book [124] is a detailed introduction to algebraic geometry. Fulton’s book [99] on algebraic curves is another good reference. Rigorous mathematical treatment on elliptic curves can be found in Silverman’s books [275, 276]. The book by Koblitz [152] is elementary, but has a somewhat different focus than needed in cryptology. By far, the best short-cut is the recent textbook from Washington [298]. Some other books by Koblitz [150, 153, 154], Blake et al. [24], Menezes [192] and Hankerson et al. [123] are written for non-experts in algebraic geometry (and hence lack mathematical details), but are good from computational viewpoint. The expository reports [46, 47] by Charlap et al. provide nice elementary introduction to elliptic curves. For hyperelliptic curves, on the other hand, no such books are available. Koblitz’s book [154] includes a chapter on hyperelliptic curves. In addition, an appendix in the same book, written by Menezes et al. much in the style of Charlap et al. [46, 47], provides an introductory and elementary coverage.

In an oversimplified sense, algebraic number theory deals with the study of number fields. The books by Janusz [140], Lang [160], Mollin [208] and Ribenboim [251] go well beyond what we cover in Section 2.13. Also see [89]. For a more modern and sophisticated treatment, look at Neukirch’s book [216]. A book dedicated to p-adic numbers is due to Koblitz [149]. Course notes from one of the authors of this book can also be useful in this regard. The notes are freely downloadable from:

http://www.facweb.iitkgp.ernet.in/~adas/IITK/course/MTH617/SS02/

Analytic number theory deals with the application of complex analytic techniques to solve problems in number theory. Although we do not explicitly need this branch of mathematics (apart from a few theorems that we mention without proofs), it is rather important for the study of numbers. Consult the books by Apostol [12] and by Ireland and Rosen [136] for this. Also see [249]. For complex analysis, we recommend the book by Ahlfors [6]

Feller’s celebrated book [92] is a classical reference on probability theory. Grinstead and Snell’s book [121] is available in the Internet.

3. Algebraic and Number-theoretic Computations

3.1Introduction
3.2Complexity Issues
3.3Multiple-precision Integer Arithmetic
3.4Elementary Number-theoretic Computations
3.5Arithmetic in Finite Fields
3.6Arithmetic on Elliptic Curves
3.7Arithmetic on Hyperelliptic Curves
3.8Random Numbers
 Chapter Summary
 Sugestions for Further Reading

From the start there has been a curious affinity between mathematics, mind and computing . . . It is perhaps no accident that Pascal and Leibniz in the seventeenth century, Babbage and George Boole in the nineteenth, and Alan Turing and John von Neumann in the twentieth – seminal figures in the history of computing – were all, among their other accomplishments, mathematicians, possessing a natural affinity for symbol, representation, abstraction and logic.

—Doron Swade [295]

. . . the laws of physics and of logic . . . the number system . . . the principle of algebraic substitution. These are ghosts. We just believe in them so thoroughly they seem real.

—Robert M. Pirsig [233]

The world is continuous, but the mind is discrete.

—David Mumford

3.1. Introduction

Now that we have studied the properties of important mathematical objects that play vital roles in public-key cryptology, it is time to concentrate on the algorithmic and implementation issues for working with these objects. We need well-defined schemes (data structures) to represent these objects and well-defined procedures (algorithms) to manipulate them. While a theoretical analysis of the performance of our data structures and algorithms is of great concern, it still leaves us in the abstract domain. In the long run, one has to translate the abstract statements in the algorithms to machine codes that the computer understands, and this is where the implementation tidbits come into picture. It is our personal experience that a naive implementation of an algorithm may run hundred times slower than a carefully optimized implementation of the same algorithm. In certain specific applications (like those based on smart cards), where memory is a scarce resource, one should also pay attention to the storage requirements of the data structures and code segments. This chapter is an introduction to all these specialized topics.

Before we proceed further, certain comments are in order. In this book, we describe algorithms using a pseudocode that closely resembles the syntax of the programming language C. The biggest difference between C and our pseudocode is that we have given preference to mathematical notations in place of C syntax. For example, = means equality in our codes, whereas assignment is denoted by :=. Similarly, our while and for loops look more human-readable, for example, for i = 0, 1, . . . , m – 1 instead of C’s for (i=0; i<m; i++). In order to understand our pseudocode, a knowledge of C (or a similar programming language) is helpful, but not essential, on the part of the reader.

For certain implementations, we assume that the target machine carries out 32-bit 2’s-complement arithmetic. This is indeed true for most modern PCs and personal work stations. By the term word, we mean a 32-bit unit in the computer memory. We will also assume that the compiler provides facilities for storing and doing arithmetic with unsigned 64-bit integers. Though this is not an ANSI C feature, most popular compilers used today do support this built-in data type (Examples: unsigned __int64 for the Microsoft Visual C++ Compiler and unsigned long long for the GNU C Compiler). Though it is apparently desirable to be more generic and to avoid these specific assumptions on the part of the machine and the compiler, our exposition highlights the power of fine-tuning based on the knowledge of the underlying system.

3.2. Complexity Issues

Given an algorithm (or an implementation of the same), the time and space required for the execution of the algorithm on a machine depend very much on the machine’s architecture and on the compiler. But this does not mean that we cannot make some general theoretical estimates. The so-called asymptotic estimates that we are going to introduce now tend to approach the real situation as the input size tends to infinity. For finite input sizes (which is always the case in practice), these theoretical predictions turn out to provide valuable guidelines.

3.2.1. Order Notations

We start with the following important definitions.

Definition 3.1.

Let f and g be positive real-valued functions of natural numbers.

  1. f is said to be bounded above by g or of the order of g, denoted f = O(g), if there exists an and a positive real constant c such that f(n) ≤ cg(n) for all nn0. In this case, we also say that g is bounded below by f and denote this by g = Ω(f).

  2. If f = O(g) and g = O(f), we say that f and g are of the same order and denote this by f = Θ(g) (or by g = Θ(f)). Equivalently, f = Θ(g) if and only if f = O(g) and f = Ω(g); that is, if and only if there exist an integer and real positive constants c1, c2 such that c1g(n) ≤ f(n) ≤ c2g(n) for all nn0.

  3. f is said to be of strictly lower order than g, denoted f = o(g), if f(n)/g(n) tends to 0 as n tends to infinity. In other words, f = o(g) if and only if for every real positive constant c (however small it may be) there exists an integer such that f(n) < cg(n) for all nnc. If f = o(g), we also say that g is of strictly higher order than f and denote this by g = ω(f). Thus g = ω(f) if and only if for every real positive constant c (however large it may be) there exists an integer such that g(n) > cf(n) for all nnc.

Example 3.1.
  1. Let f(n) := adnd + · · · + a1n + a0 with d ≥ 0, , ad > 0. Then f = Θ(nd). This heuristically means that as n becomes sufficiently large, the leading term adnd dominates over the other terms, and apart from the constant of proportionality ad the function f(n) grows with n as nd does. If f = Θ(nd) for some integer d > 0, we say that f is of polynomial order in n.[1] A Θ(1) function is often called a constant function.

    [1] This is not the complete truth. Functions like , n2.3 or n3(log n)2 would be better included in the polynomial family. Thus, we may define f to be of polynomial order (in n), if f = O(nd) and f = Ω(nd′) for some positive real constants d, d′. Similar comments hold for poly-logarithmic and exponential orders.

  2. If f = Θ((log n)a) for some real a > 0, we say that f is of poly-logarithmic order in n. By Exercise 3.2(b), any function of poly-logarithmic order grows asymptotically slower than any function of polynomial order.

  3. If f = Θ(an) for some real a > 1, f said to be of exponential order in n. Again by Exercise 3.2(b) any function of exponential order grows asymptotically faster than any function of polynomial order.

  4. Now, consider a function of the form

    Equation 3.1


    for real c > 0 and for 0 ≤ α ≤ 1. For α = 0, we have f = Θ(nc); that is, f is of polynomial order. On the other extreme, if α = 1, f = Θ(an), where a := exp(c), that is, f is of exponential order. If 0 < α < 1, we say that f is of subexponential order in n, since the order of f is somewhere in between polynomial and exponential. We will come across functions of subexponential orders quite frequently in the rest of the book. Note that as α increases from 0 to 1, the order of f also increases monotonically from polynomial to exponential.

  5. A function f = O(na(log n)b) with a > 0 and b ≥ 0 is often denoted by the soft O-notation: f = O~(na). This implies that up to multiplication by a polynomial in log n the function f is of the order of na. Similarly, if f = O(ang(n)) for a > 1 and for some g(n) of polynomial order, we say that f = O~(an). Intuitively spoken, the O-notation hides constant multipliers, whereas the soft O-notation suppresses exponentially small multipliers.

  6. The notion of order can be readily extended to functions with two or more input variables. For example, for positive real-valued functions f, g of two positive integer variables m, n one says f = O(g), if for some m0, and for some positive real constant c one has f(m, n) ≤ cg(m, n) for all mm0 and nn0. The function f(m, n) = m32n is of polynomial order in m, but of exponential order in n.

The order notation is used to analyse algorithms in the following way. For an algorithm, the input size is defined as the total number of bits needed to represent the input of the algorithm. We find asymptotic estimates of the running time and the memory requirement of the algorithm in terms of its input size. Let f(n) denote the running time[2] of an algorithm A for an input of size . If f(n) = Θ(na) (or, more generally, if f = O(na)) for some a > 0, A is called a polynomial-time algorithm. If a = 1 (resp. 2, 3, . . .), then A is specifically called a linear-time (resp. quadratic-time, cubic-time, . . .) algorithm. A Θ(1) algorithm is often called a constant-time algorithm. If f = Θ(bn) for some b > 1, A is called an exponential-time algorithm. Similarly, if f satisfies Equation (3.1) with 0 < α < 1, A is called a subexponential-time algorithm.

[2] The practical running time of an algorithm may vary widely depending on its implementation and also on the processor, the compiler and even on run-time conditions. Since we are talking about the order of growth of running times in relation to the input size, we neglect the constants of proportionality and so these variations are usually not a problem. If one plans to be more concrete, one may measure the running time by the number of bit operations needed by the algorithm.

One has similar classifications of an algorithm in terms of its space requirements, namely, polynomial-space, linear-space, exponential-space, and so on. We can afford to be lazy and drop -time from the adjectives introduced in the previous paragraph. Thus, an exponential algorithm is an exponential-time algorithm, not an exponential-space algorithm.

It is expedient to note here that the running time of an algorithm may depend on the particular instance of the input, even when the input size is kept fixed. For an example, see Exercise 3.3. We should, therefore, be prepared to distinguish, for a given algorithm and for a given input size n, between the best (that is, shortest) running time fb(n), the worst (that is, longest) running time fw(n), the average running time fa(n) on all possible inputs (of size n) and the expected running time fe(n) for a randomly chosen input (of size n). In typical situations, fw(n), fa(n) and fe(n) are of the same order, in which case we simply denote, by running time, one of these functions. If this is not the case, an unqualified use of the phrase running time would denote the worst running time fw(n).

The order notation, though apparently attractive and useful, has certain drawbacks. First it depicts the behaviour of functions (like running times) as the input size tends to infinity. In practice, one always has finite input sizes. One can check that if f(n) = n100 and g(n) = (1.01)n are the running times of two algorithms A and B respectively (for solving the same problem), then f(n) ≤ g(n) if and only if n = 1 or n ≥ 117,309. But then if the input size is only 1,000, one would prefer the exponential-time algorithm B over the polynomial-time algorithm A. Thus asymptotic estimates need not guarantee correct suggestions at practical ranges of interest. On the other hand, an algorithm which is a product of human intellect does not tend to have such extreme values for the parameters; that is, in a polynomial-time algorithm, the degree is usually ≤ 10 and the base for an exponential-time algorithm is usually not as close to 1 as 1.01 is. If we have f(n) = n5 and g(n) = 2n as the respective running times of the algorithms A and B, then A outperforms B (in terms of speed) for all n ≥ 23.

The second drawback of the order notation is that it suppresses the constant of proportionality; that is, an algorithm whose running time is 100n2 has the same order as one whose running time is n2. This is, however, a situation that we cannot neglect in practice. In particular, when we compare two different implementations of the same algorithm, the one with a smaller constant of proportionality is more desirable than the one with a larger constant. This is where implementation tricks prove to be important and even indispensable for large-scale applications.

3.2.2. Randomized Algorithms

A deterministic algorithm is one that always follows the same sequence of computations (and thereby produces the same output) for a given input. The deterministic running time of a computational problem P is the fastest of the running times (in order notation) of the known algorithms to solve P.

If an algorithm makes some random choices during execution, we call the algorithm randomized or probabilistic. The exact sequence of computations followed by the algorithm depends on these random choices and as a result different executions of the same algorithm may produce different outputs for a given input. At first glance, randomized algorithms look useless, because getting different outputs for a given input is apparently not what one would really want. But there are situations where this is desirable. For example, in an implementation of the RSA protocol, one generates random primes p and q of given bit lengths. Here we require our prime generation procedure to produce different primes during different executions (that is, for different entities on the net).

More importantly, randomized algorithms often provide practical computational solutions for many problems for which no practical deterministic algorithms are known. We will shortly encounter many such situations where randomized algorithms are simplest and/or fastest known algorithms. However, this sudden enhancement in performance by random choices does not come for free. To explain the so-called darker sides of randomization, we explain two different types of randomized algorithms.

A Monte Carlo algorithm is a randomized algorithm that may produce incorrect outputs. However, for such an algorithm to be useful, we require that the running time be always small and the probability of an error sufficiently low. A good example of a Monte Carlo algorithm is Miller–Rabin’s algorithm (Algorithm 3.13) for testing the primality of an integer. For an integer of bit size n, the Miller–Rabin test with t iterations runs in time O(tn3). Whenever the algorithm outputs false, it is always correct. But an answer of true is incorrect with an error probability ≤ 2–2t, that is, it certifies a composite integer as a prime with probability ≤ 2–2t. For t = 20, an error is expected to occur less than once in every 1012 executions. With this little sacrifice we achieve a running time of O(n3) (for a fixed t), whereas the best deterministic primality testing algorithm (known to the authors at the time of writing this book) takes time O(n7.5) and hence is not practical.

A Las Vegas algorithm is a randomized algorithm which always produces the correct output. However the running time of such an algorithm depends on the random choices made. For such an algorithm to be useful, we expect that for most random choices the running time is small. As an example, consider the problem of finding a random (monic) irreducible polynomial of degree n over . Algorithm 3.22 tests the irreducibility of a polynomial in in deterministic polynomial time. We generate random polynomials of degree n and check the irreducibility of these polynomials by Algorithm 3.22. From Section 2.9.2, we know that a randomly chosen monic polynomial of degree n over a finite field is irreducible with an approximate probability of 1/n. This implies that after O(n) random polynomials are tried, one expects to find an irreducible polynomial. The resulting Las Vegas algorithm (Algorithm 3.23) runs in expected polynomial time. It may, however, happen that for certain random choices we keep on generating reducible polynomials for an exponential number of times, but the likelihood of such an accident is very, very low (Exercise 3.5).

An algorithm is said to be a probabilistic or randomized polynomial-time algorithm, if it is either a Monte Carlo algorithm with polynomial worst running time or a Las Vegas algorithm with polynomial expected running time. Both the above examples of randomized algorithms are probabilistic polynomial-time algorithms. A combination of these two types of algorithms can also be conceived; namely, algorithms that produce correct outputs with high probability and have polynomial expected running time. Some computational problems are so challenging that even such probably correct and probably fast algorithms are quite welcome.

We finally note that there are certain computational problems for which the deterministic running time is exponential and for which randomization also does not help much. In some cases, we have subexponential randomized algorithms which are still too slow to be of reasonable practical use. Some of these so-called intractable problems are at the heart of the security of many public-key cryptographic protocols.

3.2.3. Reduction Between Computational Problems

In the last two sections, we have introduced theoretical measures (the order notations) for estimating the (known) difficulty of solving computational problems. In this section, we introduce another concept by which we can compare the relative difficulty of two computational problems.

Let P1 and P2 be two computational problems. We say that P1 is polynomial-time reducible to P2 and denote this as , if there is a polynomial-time algorithm which, given a solution of P2, provides a solution for P1. This means that if , then the problem P1 is no more difficult than P2 apart from the extra polynomial-time reduction effort. In that case, if we know an algorithm to solve P2 in polynomial time, then we have a polynomial-time algorithm for P1 too. If and , we say that the problems P1 and P2 are polynomial-time equivalent and write P1P2.

In order to give an example of these concepts, we let G be a finite cyclic multiplicative group of order n and g a generator of G. The discrete logarithm problem (DLP) is the problem of computing for a given an integer x such that a = gx. The Diffie–Hellman problem (DHP), on the other hand, is the problem of computing gxy from the given values of gx and gy. If one can compute y from gy, one can also compute gxy = (gx)y by performing an exponentiation in the group G. Therefore, , if exponentiations in G can be computed in polynomial time. In other words, if a solution for DLP is known, a solution for DHP is also available: that is, DHP is no more difficult than DLP except for the additional exponentiation effort. However, the reverse implication (that is, whether ) is not known for many groups.

So far we have assumed that our reduction algorithms are deterministic. If we allow randomized (that is, probabilistic) polynomial-time reduction algorithms, we can similarly introduce the concepts of randomized polynomial-time reducibility and of randomized polynomial-time equivalence. We urge the reader to formulate the formal definitions for these concepts.

Exercise Set 3.2

3.1
  1. Sort the following functions in the increasing sequence of order. (Don’t mind if some of these functions are not defined for a few values of n.)

    1012, 2n, 22n, 2n2, 100n2, 10–3n3, 1/n, , n!, nn,

    log n, (log n)/n, n/log n, n2 log n, n(log n)2, (0.1)log n, (log n)n,

    1/log n, , 106(log n)100, log log n, 2log log n, nlog log n,

    , , ,

    exp(n1/3(ln n)2/3), exp((ln n)1/3(ln ln n)2/3).

  2. Evaluate the functions of Part (a) at n = 10i for i = 1, 2, . . . , 10 and conclude that as n gets larger, the asymptotic ordering tallies with the actual ordering more correctly.

3.2
  1. Show that for any real a > 1 and b > 0 one has nb = o(an).

  2. For any positive real c, d, show that (log n)c = o(nd).

  3. Show that if f = O(g) and g = O(h), then f = O(h).

  4. Give an example to show that f = O(g) does not necessarily imply f = Θ(g).

  5. Give an example of a function f with f = O(n1+ε) for every ε > 0, but f is not O(n).

3.3Suppose that an algorithm A takes as input a bit string and runs in time g(t), where t is the number of one-bits in the input string. Let fb(n), fw(n), fa(n) and fe(n) respectively denote the best, worst, average and expected running times of A for inputs of size n. Derive the following table under the assumption that each of the 2n bit strings of length n is equally likely.
  Running times 
g(t)fb(n)fw(n)fa(n)fe(n)
t0nn/2n/2
t20n2n(n + 1)/4n2/4
2t12n(3/2)n

3.4
  1. Show that an exponential-space (resp. subexponential-space) algorithm must be (at least) exponential-time (resp. subexponential-time) too. You may assume that at a time a computing device can access (read/write) at most a finite number of memory locations.

  2. Give an example of an algorithm that is exponential-time but polynomial-space.

3.5Consider the Las Vegas algorithm discussed in Section 3.2.2 for generating a random irreducible polynomial of degree n over . Assume that a randomly chosen polynomial in of degree n has (an exact) probability of 1/n for being irreducible. Find out the probability pr that r polynomials chosen randomly (with repetition) from are all reducible. For n = 1000, calculate the numerical values of pr for r = 10i, i = 1, . . . , 6, and find the smallest integers r for which pr ≤ 1/2 and pr ≤ 10–12. Find the expected number of polynomials tested for irreducibility, before the algorithm terminates.
3.6Let n = pq be the product of two distinct primes p and q. Show that factoring n is polynomial-time equivalent to computing φ(n) = (p–1)(q–1), where φ is Euler’s totient function. (Assume that an arithmetic operation (including computation of integer square roots) on integers of bit size t can be performed in polynomial time (in t).)
3.7Let G be a finite cyclic multiplicative group and let H be the subgroup of G generated by whose order is known. The generalized discrete logarithm problem (GDLP) is the following: Given , find out if and, if so, find an integer x for which a = hx. Show that GDLP ≅ DLP, if exponentiations in G can be carried out in polynomial time and if DLP in H is polynomial-time equivalent to DLP in G. [H]

3.3. Multiple-precision Integer Arithmetic

Cryptographic protocols based on the rings and demand n and p to be sufficiently large (of bit length ≥ 512) in order to achieve the desired level of security. However, standard compilers do not support data types to hold with full precision the integers of this size. For example, C compilers support integers of size ≤ 64 bits. So one must employ custom-designed data types for representing and working with such big integers. Many libraries are already available that can handle integers of arbitrary length. FREELIP, GMP, LiDIA, NTL and ZEN are some such libraries that are even freely available.

Alternatively, one may design one’s own functions for multiple-precision integers. Such a programming exercise is not very difficult, but making the functions run efficiently is a huge challenge. Several tricks and optimization techniques can turn a naive implementation to a much faster and more memory-efficient code and it needs years of experimental experience to find out the subtleties. Theoretical asymptotic estimates might serve as a guideline, but only experimentation can settle the relative merits and demerits of the available algorithms for input sizes of practical interest. For example, the theoretically fastest algorithm known for multiplying two multiple-precision integers is based on the so-called fast Fourier transform (FFT) techniques. But our experience shows that this algorithm starts to outperform other common but asymptotically slower algorithms only when the input size is at least several thousand bits. Since such very large integers are rarely needed by cryptographic protocols, FFT-based multiplication is not useful in this context.

3.3.1. Representation of Large Integers

In order to represent a large integer, we break it up into small parts and store each part in a memory word[3] accessible by built-in data types. The simplest way to break up a (positive) integer a is to predetermine a radix ℜ and compute the ℜ-ary representation (as–1, . . . , a0) of a (see Exercise 3.8). One should have ℜ ≤ 232 so that each ℜ-ary digit ai can be stored in a memory word. For the sake of efficiency, it is advisable to take ℜ to be a power of 2. It is also expedient to take ℜ as large as possible, because smaller values of ℜ lead to (possibly) longer size s and thereby add to the storage requirement and also to the running time of arithmetic functions. The best choice is ℜ = 232. We denote by ulong a built-in unsigned integer data type provided by the compiler (like the ANSI C standard unsigned long). We use an array of ulong for storing the digits. The array can be static or dynamic. Though dynamic arrays are more storage-efficient (because they can be allocated only as much memory as needed), they have memory allocation and deallocation overheads and are somewhat more complicated to programme than static arrays. Moreover, for cryptographic protocols one typically needs integers no longer than 4096 bits. Since the product of two integers of bit size t has bit size ≤ 2t, a static array of 8192/32 = 256 ulong suffices for storing cryptographic integers. It is also necessary to keep track of the actual size of an integer, since filling up with leading 0 digits is not an efficient strategy. Finally, it is often useful to have a signed representation of integers. A sign bit is also necessary for this case. We state three possible declarations in Exercise 3.11.

[3] We assume that a word in the memory is 32 bits long.

3.3.2. Basic Arithmetic Operations

We now describe the implementations of addition, subtraction, multiplication and Euclidean division of multiple-precision integers. Every other complex operation (like modular arithmetic, gcd) is based on these primitives. It is, therefore, of utmost importance to write efficient codes for these basic operations.

For integers of cryptographic sizes, the most efficient algorithms are the standard ones we use for doing arithmetic on decimal numbers, that is, for two positive integers a = as–1 . . . a0 and b = bt–1 . . . b0 we compute the sum c = a + b = cr–1 . . . c0 as follows. We first compute a0 + b0. If this sum is ≥ ℜ, then c0 = a0 + b0 – ℜ and the carry is 1, otherwise c0 = a0 + b0 and the carry is 0. We then compute a1 + b1 plus the carry available from the previous digit, and compute c1 and the next carry as before.

For computing the product d = ab = dl–1 . . . d0, we do the usual quadratic procedure; namely, we initialize all the digits of d to 0 and for each i = 0, . . . , s – 1 and j = 0, . . . , t – 1 we compute aibj and add it to the (i + j)-th digit of d. If this sum (call it σ) at the (i + j)-th location exceeds ℜ – 1, we find out q, r with σ = qℜ + r, r < ℜ. Then di+j is assigned r, and q is added to the (i + j + 1)-st location. If that addition results in a carry, we propagate the carry to higher locations until it gets fully absorbed in some word of d.

All these sound simple, but complications arise when we consider the fact that the sum of two 32-bit words (and a possible carry from the previous location) may be 33 bits long. For multiplication, the situation is even worse, because the product aibj can be 64 bits long. Since our machine word can hold only 32 bits, it becomes problematic to hold all these intermediate sums and products to full precision. We assume that the least significant 32 bits are correctly returned and assigned to the output variable (ulong), whereas the leading 32 bits are lost.[4] The most efficient way to keep track of these overflows is to use assembly instructions and this is what many number theory packages (like PARI and UBASIC) do. But this means that for every target architecture we have to write different assembly codes. Here we describe certain tricks that make it possible to grab the overflow information with only high-level languages, without sufficiently degrading the performance compared to assembly instructions.

[4] This is the typical behaviour of a CPU that supports 2’s complement arithmetic.

Addition and subtraction

First consider the sum ai + bi. We compute the least significant 32 bits by assigning ci = ai + bi. It is easy to see that an overflow occurs during this sum if and only if ci < ai. We set the output carry accordingly. Now, let us consider the situation when we have an input carry: that is, when we compute the sum ci = ai + bi+1. Here an overflow occurs if and only if ciai. Algorithm 3.1 performs this addition of words.

Algorithm 3.1. Addition of words

Input: Words ai and bi and the input carry .

Output: Word ci and the output carry with ai + bi + γi = ci + δiℜ.

Steps:

ci := ai + bi.

ifi) { ci ++, δi := ( (ciai) ? 1 : 0 ). } else { δi := ( (ci < ai) ? 1 : 0 ). }

Algorithm 3.1 assumes that ci and ai are stored in different memory words. If this is not the case, we should store ai + bi in a temporary variable and, after the second line, ci should be assigned the value of this temporary variable. Note also that many processors provide an increment primitive which is faster than the general addition primitive. In that case, the statement ci++ is preferable to ci := ci+1.

For subtraction, we proceed analogously from right to left and keep track of the borrow. Here the check for overflow can be done before the subtraction of words is carried out (and, therefore, no temporary variable is needed, if we assume that the output carry is not stored in the location of the operands).

Algorithm 3.2. Subtraction of words

Input: Words ai and bi and the input borrow .

Output: Word ci and the output borrow with aibi – γi = ci – δiℜ.

Steps:

ifi) { δi := ( (aibi) ? 1 : 0 ), ci := aibi, ci – –. }

else { δi := ( (ai < bi) ? 1 : 0 ), ci := aibi. }

We urge the reader to develop the complete addition and subtraction procedures for multiple-precision integers, based on the above primitives for words.

Multiplication

The product of two 32-bit words can be as long as 64 bits, and we plan to (compute and) store this product in two words. Assuming the availability of a built-in 64-bit unsigned integer data type (which we will henceforth denote as ullong), this can be performed as in Algorithm 3.3.

Algorithm 3.3. Multiplication of words

Input: Words a and b.

Output: Words c and d with ab = cℜ + d.

Steps:

/* We use a temporary variable t of data type ullong */

t := (ullong)(a) * (ullong)(b), c := (ulong)(t ≫ 32), d := (ulong)t.

We use a temporary 64-bit integer variable t to store the product ab. The lower 32 bits of t is stored in d by simply typecasting, whereas the higher 32 bits of t is obtained by right-shifting t (the operator ≫) by 32 bits. This is a reasonable strategy given that we do not explore assembly-level instructions. Algorithm 3.4 describes a multiplication algorithm for two multiple-precision integer operands, that does not directly use the word-multiplying primitive of Algorithm 3.3.

The reader can verify easily that this code properly computes the product. We now highlight how this makes the computation efficient. The intermediate results are stored in the array t of 64-bit ullong. This means that after the 64-bit product aibj of words ai and bj is computed (in the temporary variable T), we directly add T to the location ti+j. If the sum exceeds ℜ2 – 1 = 264 – 1, that is, if an overflow occurs, we should add ℜ to ti + j + 1 or equivalently 1 to ti+j+2. This last addition is one of ullong integers and can be made more efficient, if this is replaced by ulong increments, and this is what we do using the temporary array u. Since the quadratic loop is the bottleneck of the multiplication procedure, it is absolutely necessary to make this loop as efficient as possible.

Algorithm 3.4. Multiplication of multiple-precision integers

Input: Integers a = (ar–1 . . . a0) and b = (bs–1 . . . b0)

Output: The product c = (cr+s–1 . . . c0) = ab.

Steps:

/* Let T be a variable and t0, . . . , tr+s–1 an array of ullong variables */

/* Let v be a variable and u0, . . . , ur+s–1 an array of ulong variables */

Initialize the array locations ci, ti and ui to 0 for all i = 0, . . . , r + s – 1.

/* The quadratic loop */
for (i = 0, . . . , r – 1) and (j = 0, . . . , s – 1) {
   T := (ullong)(ai) * (ullong)(bj).
   if ((ti+j + = T) < Tui+j+2 ++.
}

/* Deferred normalization */
for (i = 0, . . . , r + s – 1) {
    if ((ci + = ui) < uiui+1 ++.
    v := (ulong)(ti), if ((ci + = v) < vui+1++.
    v := (ulong)(ti ≫ 32), if ((ci+1 + = v) < vui+2 ++.
}

After the quadratic loop, we do deferred normalization from the array of 64-bit double-words ti to the array of 32-bit words ci. This is done using the typecasting and right-shift strategy mentioned in Algorithm 3.3. We should also take care of the intermediate carries stored in the array u. The normalization loop takes a total time of O(r + s), whereas the quadratic loop takes time O(rs). If we had done normalization inside the quadratic loop itself, that would incur an additional O(rs) cost (which is significantly more than that of deferred normalization).

Squaring

If both the operands a and b of multiplication are same, it is not necessary to compute aibj and ajbi separately. We should add to ti+j the product , if i = j, or the product 2aiaj, if i < j. Note that 2aiaj can be computed by left shifting aiaj by one bit. This might result in an overflow which can be checked before shifting by looking at the 64th bit of aiaj. Algorithm 3.5 incorporates these changes.

Fast multiplication

For the multiplication of two multiple-precision integers, there are algorithms that are asymptotically faster than the quadratic Algorithms 3.4 and 3.5. However, not all these theoretically faster algorithms are practical for sizes of integers used in cryptology. Our practical experience shows that a strategy due to Karatsuba outperforms the quadratic algorithm, if both the operands are of roughly equal sizes and if the bit lengths of the operands are 300 or more. We describe Karatsuba’s algorithm in connection with squaring, where the two operands are same (and hence of the same size). Suppose we want to compute a2 for a multiple-precision integer a = (ar–1 . . . a0). We first break a into two integers of almost equal sizes, namely, α := (ar–1 . . . at) and β := (at–1 . . . a0), so that a = ℜtα + β. Now, a2 = α22t + 2αβℜt + β2 and 2αβ = (α2 + β2) – (α – β)2. We recursively invoke Karatsuba’s multiplication with operands α, β and α – β. Recursion continues as long as the operands are not too small and the depth of recursion is within a prescribed limit. One can check that Karatsuba’s algorithm runs in time O(rlg 3 lg r) = O(r1.585 lg r) which is a definite improvement over the O(r2) running time taken by the quadratic algorithm.

Algorithm 3.5. The quadratic loop for squaring

for (i = 0, . . . , r – 1) and (j = i, . . . , r – 1) {
   T := (ullong)(ai) * (ullong)(aj).
   if (i ≠ j) {
      if (the 64th bit of T is 1) ui+j+2 ++.
      T ≪= 1.
   }
   if ((ti+j + = T) < Tui+j+2 ++.
}

The best-known algorithm for multiplication of two multiple-precision integers is based on the fast Fourier transform (FFT) techniques and has running time O~(r). However, for integers used in cryptology this algorithm is usually not practical. Therefore, we will not discuss FFT multiplication in this book.

Division

Euclidean division with remainder of multiple-precision integers is somewhat cumbersome, although conceptually as difficult (that is, as simple) as the division procedure of decimal integers, taught in early days of school. The most challenging part in the procedure is guessing the next digit in the quotient. For decimal integers, we usually do this by looking at the first few (decimal) digits of the divisor and the dividend. This need not give us the correct digit, but something close to the same. In the case of ℜ-ary digits, we also make a guess of the quotient digit based on a few leading ℜ-ary digits of the divisor and the dividend, but certain precautions have to be taken to ensure that the guess is not too different from the correct one.

Suppose we are given positive integers a = (ar–1 . . . a0) and b = (bs–1 . . . b0)ℜ with ar–1 ≠ 0 and bs–1 ≠ 0, and we want to compute the integers x = (xrs . . . x0) and y = (ys–1 . . . y0) with a = xb + y, 0 ≤ y < b. First, we want that bs–1 ≥ ℜ/2 (you’ll see why, later). If this condition is already not met, we force it by multiplying both a and b by 2t for some suitable t, 0 < t < 32. In that case, the quotient remains the same, but the remainder gets multiplied by 2t. The desired remainder can be later found out easily by right-shifting the computed remainder by t bits. The process of making bs–1 ≥ ℜ/2 is often called normalization (of b). Henceforth, we will assume that b is normalized. Note that normalization may increase the word-size of a by 1.

Algorithm 3.6. Euclidean division of multiple-precision integers

Input: Integers a = (ar–1 . . . a0) and b = (bs–1 . . . b0) with r ≥ 3, s ≥ 2, ar–1 ≠ 0, bs–1 ≥ ℜ/2 and ab.

Output: The quotient x = (xrs . . . x0) = a quot b and the remainder y = (ys–1 . . . y0) = a rem b of Euclidean division of a by b.

Steps:

Initialize the quotient digits xi to 0 for i = 0, . . . , r – s.

/* The main loop */
for (i = r – 1, . . . , s) {
   /* Initial check */
   if (ai ≥ bs–1and (a ≥ bis+1) { xis+1++, a := a – bis+1. }

   /* Guess the next digit of quotient */
   if (ai = bs–1xis := ℜ – 1, else xis := ⌊(aiℜ + ai–1)/bs–1)⌋.
   if (xis ≠ 0)
       while (xis(bs–1ℜ + bs–2) > ai2 + ai–1ℜ + ai–2xis– –.

   /* Modify the guess to the correct value */
   z := xisbis.
   if (a < z) { xis– –, z := z – bis. }
   a := a – z.
}

/* Here the quotient may be one less than the actual value */
if (a ≥ b) { a := a – bx := x+1. }
y := a.

Algorithm 3.6 implements multiple-precision division. It is not difficult to prove the correctness of the algorithm. We refrain from doing so, but make some useful comments. The initial check inside the main loop may cause the increment of xis+1. This may lead to a carry which has to be adjusted to higher digits. This carry propagation is not mentioned in the code for simplicity. Since b is assumed to be normalized, this initial check needs to be carried out only once; that is, for a non-normalized b we have to replace the if statement by a while loop. This is the first advantage of normalization. In the first step of guessing the quotient digit xis, we compute ⌊(aiℜ + ai–1)/bs–1⌋ using ullong arithmetic. At this point, the guess is based only on two leading digits of a and one leading digit of b. In the while loop, we refine this guess by considering one more digit of a and b each. Since b is normalized, this while loop is executed no more than twice (the second advantage of normalization). The guess for xis made in this way is either equal to or one more than the correct value which is then computed by comparing a with xisbis. The running time of the algorithm is O(s(rs)). For a fixed r, this is maximum (namely O(r2)) when sr/2.

Bit-wise operations

Multiplication and division by a power of 2 can be carried out more efficiently using bit operations (on words) instead of calling the general procedures just described. It is also often necessary to compute the bit length of a non-zero multiple-precision integer and the multiplicity of 2 in it. In these cases also, one should use bit operations for efficiency. For these implementations, it is advantageous to maintain precomputed tables of the constants 2i, i = 0, . . . , 31, and of 2i – 1, i = 0, . . . , 32, rather than computing them in situ every time they are needed. In Algorithm 3.7, we describe an implementation of multiplication by a power of 2 (that is, the left shift operation). We use the symbols OR, ≫ and ≪ to denote bit-wise or, right shift and left shift operations on 32-bit integers.

Algorithm 3.7. Left-shift of multiple-precision integers

Input: Integer a = (ar–1 . . . a0) ≠ 0, ar–1 ≠ 0, and .

Output: The integer c = (cs–1 . . . c0) = a · 2t, cs–1 ≠ 0.

Steps:

u := t quot 32, v := t rem 32.
if (v = 0) { /* Word-by-word copy */
    s := r + u.
    for (i = r – 1, . . . , 0) ci+u := ai.
else { /* Use shifts of individual words */
    s := r + u + 1, cs–1 := 0.
    for (i = r – 1, . . . , 0) { ci+u+1 := ci+u+1 OR (ai ≫ (32 – v)), ci+u := (ai ≪ v). }
    if (cs–1 = 0) s– –.
}
for (i = u – 1, . . . , 0) ci := 0.

Unless otherwise mentioned, we will henceforth forget about the above structural representation of multiple-precision integers and denote arithmetic operations on them by the standard symbols (+, –, * or · or ×, quot, rem and so on).

3.3.3. GCD

Computing the greatest common divisor of two (multiple-precision) integers has important applications. In this section, we assume that we want to compute the (positive) gcd of two positive integers a and b. The Euclidean gcd loop comprising repeated division (Proposition 2.15) is not usually the most efficient way to compute integer gcds. We describe the binary gcd algorithm that turns out to be faster for practical bit sizes of the operands a and b. If a = 2ra′ and b = 2sb′ with a′ and b′ odd, then gcd(a, b) = 2min(r,s) gcd(a′, b′). Therefore, we may assume that a and b are odd. In that case, if a > b, then gcd(a, b) = gcd(ab, b) = gcd((ab)/2t, b), where t := v2(ab) is the multiplicity of 2 in ab. Since the sum of the bit sizes of (ab)/2t and b is strictly smaller than that of a and b, repeating the above computation eventually terminates the algorithm after finitely many iterations.

Algorithm 3.8. Extended binary gcd

Input: Two positive integers a, b with ab and b odd.

Output: Integers d, u and v with d = gcd(a, b) = ua + vb > 0. If (a, b) ≠ (1, 1), then |u| < b and |v| < a.

Steps:

/* Initial reduction */
Compute integers q and r satisfying a = bq + r with 0 ≤ r < b.
if (r = 0) { (duv) := (b, 0, 1), return. }

/* Initialize */
(xy) := (br).
v1 := 0, v2 := 1.

/* Main loop */
while (1) {
   if (x ≥ y) {
      x := x – y.   /* x is even here except perhaps in the first iteration */
      v1 := v1 – v2.
      if (x = 0) {   /* End loop and return du and v */
         u2 := (y – v2r)/b.
         (duv) := (yv2u2 – v2q).
         Return.
      } else if (x is even) {
         t := v2(x), x := x/2t.    /* x is odd here */
         for (i = 1, . . . , t) {
            if (v1 is odd) v1 := v1 + b.
            v1 := v1/2.
         }
       }
     } else { /* if (x < y) */
       y := y – xv2 := v2 – v1.    /* y is even here */
       t := v2(y), y := y/2t.   /* y is odd here */
       for (i = 1, . . . , t) {
          if (v2 is oddv2 := v2 + b.
          v2 := v2/2.
       }
   }
}

Multiple-precision division is much costlier than subtraction followed by division by a power of 2. This is why the binary gcd algorithm outperforms the Euclidean gcd algorithm. However, if the bit sizes of a and b differ reasonably, it is preferable to use Euclidean division once and replace the pair (a, b) by (b, a rem b), before entering the binary gcd loop. Even when the original bit sizes of a and b are not much different, one may carry out this initial reduction, because in this case Euclidean division does not take much time.

Recall from Proposition 2.16 that if d := gcd(a, b), then for some integers u and v we have d = ua + vb. Computation of d along with a pair of integers u, v is called the extended gcd computation. Both the Euclidean and the binary gcd loops can be augmented to compute these integers u and v. Since binary gcd is faster than Euclidean gcd, we describe an implementation of the extended binary gcd algorithm. We assume that 0 < ba and compute u and v in such a way that if (a, b) ≠ (1, 1), then |u| < b and |v| < a. Algorithm 3.8, which shows the details, requires b to be odd. The other operand a may also be odd, though the working of the algorithm does not require this.

In order to prove the correctness of Algorithm 3.8, we introduce the sequence of integers xk, yk, u1,k, u2,k, v1,k and v2,k for k = 0, 1, 2, . . . , initialized as:

x0 := b,u1, 0 := 1,v1, 0 := 0,
y0 := r,u2, 0 := 0,v2, 0 := 1.

During the k-th iteration of the main loop, k = 1, 2, . . . , we modify the values xk–1, yk–1, u1,k–1, u2,k–1, v1,k–1 and v2,k–1 to xk, yk, u1,k, u2,k, v1,k and v2,k in such a way that we always maintain the relations:

u1,kx0 + v1,ky0=xk,
u2,kx0 + v2,ky0=yk.

The main loop terminates when xk = 0, and at that point we have the desired relation yk = gcd(b, r) = u2,kb + v2,kr. For the updating during the k-th iteration, we assume that xk–1yk–1. (The converse inequality can be handled analogously.) The x and y values are updated as xk := (xk–1yk–1)/2tk, yk := yk–1, where tk := v2(xk–1yk–1). Thus, we have u2,k = u2,k–1 and v2,k = v2,k–1, whereas if tk > 0, we write

All the expressions within square brackets in the last equation are integers, since x0 = b is odd. Note that updating the variables in the loop requires only the values of these variables available from the previous iteration. Therefore, we may drop the prefix k and call these variables x, y, u1, u2, v1 and v2. Moreover, the variables u1 and u2 need not be maintained and updated in every iteration, since the updating procedure for the other variables does not depend on the values of u1 and u2. We need the value of u2 only at the end of the main loop, and this is available from the relation y = u2b + v2r maintained throughout the loop. The formula u2b + v2r = y = gcd(b, r) is then combined with the relations a = qb + r and gcd(a, b) = gcd(b, r) to get the final relation gcd(a, b) = v2a + (u2v2q)b.

Algorithm 3.8 continues to work even when a < b, but in that case the initial reduction simply interchanges a and b and we forfeit the possibility of the reduction in size of the arguments (x and y) caused by the initial Euclidean division.

Finally, we remove the restriction that b is odd. We write a = 2ra′ and b = 2sb′ with a′, b′ odd and call Algorithm 3.8 with a′ and b′ as parameters (swapping a′ and b′, if a′ < b′) to compute integers d′, u′, v′ with d′ = gcd(a′, b′) = ua′ + vb′. Without loss of generality, assume that rs. Then d := gcd(a, b) = 2sd′ = u′(2sa′) + vb. If r = s, then 2sa′ = a and we are done. So assume that r > s. If u′ is even, we can extract a power of 2 from u′ and multiply 2sa′ by this power. So let’s say that we have a situation of the form for some integers and , with odd, and for st < r. We can rewrite this as . Since is even, this gives us , where τ > t and where is odd or τ = r. Proceeding in this way, we eventually reach a relation of the form d = u(2ra′) + vb = ua + vb. It is easy to check that if (a′, b′) ≠ (1, 1), then the integers u and v obtained as above satisfy |u| < b and |v| < a.

3.3.4. Modular Arithmetic

So far, we have described how we can represent and work with the elements of . In cryptology, we are seemingly more interested in the arithmetic of the rings for multiple-precision integers n. We canonically represent the elements of by integers between 0 and n – 1.

Let a, . In order to compute a + b in , we compute the integer sum a + b, and, if a + bn, we subtract n from a + b. This gives us the desired canonical representative in . Similarly, for computing ab in , we subtract b from a as integers, and, if the difference is negative, we add n to it. For computing , we multiply a and b as integers and then take the remainder of Euclidean division of this product by n.

Note that is invertible (that is, ) if and only if gcd(a, n) = 1. For , a ≠ 0, we call the extended (binary) gcd algorithm with a and n as the arguments and get integers d, u, v satisfying d = gcd(a, n) = ua+vn. If d > 1, a is not invertible modulo n. Otherwise, we have ua ≡ 1 (mod n), that is, a–1u (mod n). The extended gcd algorithm indeed returns a value of u satisfying |u| < n. Thus if u > 0, it is the canonical representative of a–1, whereas if u < 0, then u + n is the canonical representative of a–1.

Modular exponentiation

Another frequently needed operation in is modular exponentiation, that is, the computation of ae for some and . Since a0 = 1 for all and since ae = (a–1)e for e < 0 and , we may assume, without loss of generality, that . Computing the integral power ae followed by taking the remainder of Euclidean division by n is not an efficient way to compute ae in . Instead, after every multiplication, we reduce the product modulo n. This keeps the size of the intermediate products small. Furthermore, it is also a bad idea to compute ae as (· · ·((a·aa)· · ·a) which involves e – 1 multiplications. It is possible to compute ae using O(lg e) multiplications and O(lg e) squarings in , as Algorithm 3.9 suggests. This algorithm requires the bits of the binary expansion of the exponent e, which are easily obtained by bit operations on the words of e.

The for loop iteratively computes bi := a(er–1 ... ei)2 (mod n) starting from the initial value br := 1. Since (er–1 . . . ei)2 = 2(er–1 . . . ei+1)2 + ei, we have (mod n). This establishes the correctness of the algorithm. The squaring (b2) and multiplication (ba) inside the for loop of the algorithm are computed in (that is, as integer multiplication followed by reduction modulo n). If we assume that er–1 = 1, then r = ⌈lg e⌉. The algorithm carries out r squares and ρ ≤ r multiplications in , where ρ is the number of bits of e, that are 1. On an average ρ = r/2. Algorithm 3.9 runs in time O((log e)(log n)2). Typically, e = O(n), so this running time is O((log n)3).

Algorithm 3.9. Modular exponentiation: square-and-multiply algorithm

Input: , .

Output: .

Steps:

Let the binary expansion of e be e = (er–1 . . . e1e0)2where each .
b := 1.
for (i = r – 1, . . . , 0) {
   b := b2 (mod n).    /* Squaring */
   if (ei = 1) b := ba (mod n).    /* Multiplication */
}

Now, we describe a simple variant of this square-and-multiply algorithm, in which we choose a small t and use the 2t-ary representation of the exponent e. The case t = 1 corresponds to Algorithm 3.9. In practical situations, t = 4 is a good choice. As in Algorithm 3.9, multiplication and squaring are done in .

Algorithm 3.10. Modular exponentiation: windowed square-and-multiply algorithm

Input: , .

Output: .

Steps:

Let e = (er–1 . . . e1e0)2twhere each .
Compute and store  for l = 0, 1, . . . , 2t – 1.   /* Precomputation */
b := 1.
for (i = r – 1, . . . , 0) {
   for (j = 1, . . . , t) b := b2 (mod n).    /* Squaring */
   b := baei (mod n).     /* Multiplication: Read aei from the precomputed table */
}

In Algorithm 3.10, the powers al, l = 0, 1, . . . , 2t – 1, are precomputed using the formulas: a0 = 1, a1 = a and al = al–1 · a for l ≥ 2. The number of squares inside the for loop remains (almost) the same as in Algorithm 3.9. However, the number of multiplications in this loop reduces at the expense of the precomputation step. For example, let n be an integer of bit length 1024 and let en. A randomly chosen e of this size has about 512 one-bits. Therefore, the for loop of Algorithm 3.9 does about 512 multiplications, whereas with t = 4 Algorithm 3.10 does only 1024/4 = 256 multiplications with the precomputation step requiring 14 multiplications. Thus, the total number of multiplications reduces from (about) 512 to 14 + 256 = 270.

Montgomery exponentiation

During a modular exponentiation in , every reduction (computation of remainder) is done by the fixed modulus n. Montgomery exponentiation exploits this fact and speeds up each modular reduction at the cost of some preprocessing overhead.

Assume that the storage of n requires s ℜ-ary digits, that is, n = (ns–1 . . . n0) (with ns–1 ≠ 0). Take R := ℜs = 232s, so that R > n. As is typical in most cryptographic situations, n is an odd integer (for example, a big prime or a product of two big primes). Then gcd(ℜ, n) = gcd(R, n) = 1. Use the extended gcd algorithm to precompute n′ := –n–1 (mod ℜ).

We associate with , where (mod n). Since R is invertible modulo n, this association gives a bijection of onto itself. This bijection respects the addition in : that is, in . Multiplication in , on the other hand, corresponds to , and can be implemented as Algorithm 3.11 suggests.

Algorithm 3.11. Montgomery multiplication

Input: and (Montgomery representations of x, ).

Output: Montgomery representation of .

Steps:

Montgomery multiplication works as follows. In the first step, it computes the integer product . The subsequent for loop computes (mod n). Since n′ ≡ –n–1 (mod ℜ), the i-th iteration of the loop makes wi = 0 (and leaves wi–1, . . . ,w0 unchanged). So when the for loop terminates, we have w0 = w1 = · · · = ws–1 = 0: that is, is a multiple of ℜs = R. Therefore, is an integer. Furthermore, this is obtained by adding to a multiple of n: that is, for some integer k ≥ 0. Since R is coprime to n, it follows that (mod n). But this may be bigger than the canonical representative of . Since k is an integer with s ℜ-ary digits (so that k < R) and and n < R, it follows that . Therefore, if exceeds n – 1, a single subtraction suffices.

Computation of requires ≤ s2 single-precision multiplications. One can use the optimized Algorithm 3.4 for that purpose. In case of squaring, and further optimizations (say, in the form of Karatsuba’s method) can be employed.

Each iteration of the for loop carries out s + 1 single-precision multiplications. (The reduction modulo ℜ is just returning the more significant word in the two-word product win′.) Since, the for loop is executed s times, Algorithm 3.11 performs a total of ≤ s2 + s(s+1) = 2s2 + s single-precision multiplications.

Integer multiplication (Algorithm 3.4) followed by classical modular reduction (Algorithm 3.6) does almost an equal number of single-precision multiplications, but also O(s) divisions of double-precision integers by single-precision ones. It turns out that the complicated for loop of Algorithm 3.6 is slower than the much simpler loop in Algorithm 3.11. But if precomputations in the Montgomery multiplication are taken into account, we do not tend to achieve a speed-up with this new technique. For modular exponentiations, however, precomputations need to be done only once: that is, outside the square-and-multiply loop, and Montgomery multiplication pays off. In Algorithm 3.12, we rewrite Algorithm 3.9 in terms of the Montgomery arithmetic. A similar rewriting applies to Algorithm 3.10.

Algorithm 3.12. Montgomery exponentiation

Input: , .

Output: b = ae (mod n).

Steps:

/* Precomputations */
n′ := –n (mod ℜ). .

/* The square-and-multiply loop */

Exercise Set 3.3

3.8Let , ℜ > 1. Show that every can be represented uniquely as a tuple (as–1, . . . , a1, a0) for some (depending on a) with

a = as–1s–1 + · · · + a1ℜ + a0,

0 ≤ ai < ℜ for all i and as–1 ≠ 0. In this case, we write a as (as–1 . . . a0) or simply as as–1 . . . a0, when ℜ is understood from the context. ℜ is called the radix or base of this representation, as–1, . . . , a0 the (ℜ-ary) digits of a, as–1 the most significant digit, a0 the least significant digit and s the size of a with respect to the radix ℜ.

3.9Let . Show that every can be written uniquely as

a = asRs + as–1Rs–1 + · · · + a1R + a0

with each .

3.10

Negative radix Show that every integer can be written as

a = as(–2)s + as–1(–2)s–1 + · · · + a1(–2) + a0

with . Moreover, if we force that as ≠ 0 for a ≠ 0 and that s = 0 for a = 0, argue that this representation is unique.

3.11Investigate the relative merits and demerits of the following three representations (in C) of multiple-precision integers needed for cryptography. In each case, we have room for storing 256 ℜ-ary words, the actual size and a sign indicator. In the second and third representations, we use two extra locations (sizeIdx and signIdx) in the digit array for holding the size and sign information.
/* Representation 1 */
typedef struct {
   int size;
   boolean sign;
   ulong digits[256];
cryptInt1;
/* Representation 2 */
typedef ulong cryptInt2[258];
#define signIdx 0
#define sizeIdx 1
/* Representation 3 */
typedef ulong cryptInt3[258];
#define signIdx 256
#define sizeIdx 257

Remark: We recommend the third representation.

3.12Write an algorithm that prints a multiple-precision integer in decimal and an algorithm that accepts a string of decimal digits (optionally preceded by a + or – sign) and stores the corresponding integer as a multiple-precision integer. Also write algorithms for input and output of multiple-precision integers in hexadecimal, octal and binary.
3.13Write an algorithm which, given two multiple-precision integers a and b, compares the absolute values |a| and |b|. Also write an algorithm to compare a and b as signed integers.
3.14
  1. Write an algorithm that uses the Euclidean gcd loop (Proposition 2.15) to compute the gcd d of two integers a and b. (Observe that gcd(a, b) = gcd(b, a rem b) for b ≠ 0.)

  2. Modify the Euclidean gcd algorithm of Part (a), so that for given integers a, b we obtain d, u, v with d = gcd(a, b) = ua + vb.

3.15Describe a representation of rational numbers with exact multiple-precision numerators and denominators. Implement the arithmetic (addition, subtraction, multiplication and division) of rational numbers under this representation.
3.16

Sliding window exponentiation Suppose we want to compute the modular exponentiation ae (mod n). Consider the following variant of the square-and-multiply algorithm: Choose a small t (say, t = 4) and precompute a2t–1, a2t–1+1, . . . , a2t–1 modulo n. Do squaring for every bit of e, but skip the multiplication for zero bits in e. Whenever a 1 bit is found, consider the next t bits of e (including the 1 bit). Let these t bits represent the integer l, 2t–1l ≤ 2t – 1. Multiply by al (mod n) (after computing usual t squares) and move right in e by t bit positions. Argue that this method works and write an algorithm based on this strategy. What are the advantages and disadvantages of this method over Algorithm 3.10?

3.17Suppose we want to compute aebf (mod n), where both e and f are positive r-bit integers. One possibility is to compute ae and bf modulo n individually, followed by a modular multiplication. This strategy requires the running time of two exponentiations (neglecting the time for the final multiplication). In this exercise, we investigate a trick to reduce this running time to something close to 1.25 times the time for one exponentiation. Precompute ab (mod n). Inside the square-and-multiply loop, either skip the multiplication or multiply by a, b or ab, depending upon the next bits in the two exponents e and f. Complete the details of this algorithm. Deduce that, on an average, the running time of this algorithm is as declared above.
3.18Let , m ≠ 1. An addition chain for m of length l is a sequence 1 = a1, a2, . . . , al = m of natural numbers such that for every index i, 2 ≤ il, there exist indices i1, i2 < i with ai = ai1 + ai2. (It is allowed to have i1 = i2.)
  1. If 1 = a1, a2, . . . , al = m is an addition chain for m and if j1, j2, . . . , jl is a permutation of 1, 2, . . . , l with aj1aj2 ≤ · · · ≤ ajl, show that aj1, aj2, . . . , ajl is also an addition chain for m. It, therefore, suffices to consider sorted addition chains only.

  2. Show that m has an addition chain of length ≤ 2 ⌈lg m⌉. [H]

  3. Let G be a (multiplicative) group and . Design an algorithm for computing gm given an addition chain for m. What is the complexity of the algorithm (in terms of the length of the given addition chain)?

  4. Show that Algorithms 3.9 and 3.10 use addition chains for e of lengths ≤ 2 ⌈lg e⌉.

3.4. Elementary Number-theoretic Computations

Now that we know how to work in and in the residue class rings , , we address some important computational problems associated with these rings. In this chapter, we restrict ourselves only to those problems that are needed for setting up various cryptographic protocols.

3.4.1. Primality Testing

One of the simplest and oldest questions in algorithmic number theory is to decide if a given integer , n > 1, is prime or composite. Practical primality testing algorithms are based on randomization techniques. In this section, we describe the Monte Carlo algorithm due to Miller and Rabin. The obvious question that comes next is to find one (or all) of the prime factors of an integer, deterministically or probabilistically proven to be composite. This is the celebrated integer factorization problem and will be formally introduced in Section 4.2. In spite of the apparent proximity between the primality testing and the integer factoring problems, they currently have widely different (known) complexities. Primality testing is easy and thereby promotes efficient setting up of cryptographic protocols. On the other hand, the difficulty of factoring integers protects these protocols against cryptanalytic attacks.

Definition 3.2.

Let n be an odd integer greater than 1 and let with gcd(a, n) = 1. Then n is called a pseudoprime to the base a, if an–1 ≡ 1 (mod n).

By Fermat’s little theorem, a prime p is a pseudoprime to every base with gcd(a, p) = 1. However, the converse of this is not true. By Exercise 3.19, n is not a pseudoprime to at least half of the bases in , provided that there is at least one such base in . Unfortunately, there exist composite integers m, known as Carmichael numbers, such that m is a pseudoprime to every base . The smallest Carmichael number is 561 = 3 × 11 × 17. Exercises 3.21 and 3.22 investigate some properties of these numbers. Though Carmichael numbers are not very abundant in nature (), they are still infinite in number. So a robust primality test requires n to satisfy certain constraints in addition to being a pseudoprime to one or more bases. The following constraint is due to Solovay and Strassen.

Definition 3.3.

Let n be an odd integer > 1 and let with gcd(a, n) = 1. Then n is called an Euler pseudoprime or a Solovay–Strassen pseudoprime to the base a, if (mod n), where is the Jacobi symbol (Definition 2.32). Clearly, an Euler pseudoprime to the base a is also a pseudoprime to the base a.

By Euler’s criterion (Proposition 2.21), if p is a prime and gcd(a, p) = 1, then p is an Euler pseudoprime to the base a. The converse in not true, in general, but if n is composite, then n is an Euler pseudoprime to at most φ(n)/2 bases in (Exercise 3.20). This, in turn, implies that if n is an Euler pseudoprime to t randomly chosen bases in , then the chance that n is composite is no more than 1/2t. This observation leads to a Monte Carlo algorithm for testing the primality of an integer, where the probability of error (1/2t) can be made arbitrarily small by choosing large values of t. A more efficient algorithm can be developed using the following concept due to Miller and Rabin.

Definition 3.4.

Let n be an odd integer > 1 with n – 1 = 2rn′, r := v2(n – 1) > 0, n′ odd, and let with gcd(a, n) = 1. Then n is called a strong pseudoprime to the base a, if either an ≡ 1 (mod n) or a2in ≡ –1 (mod n) for some i, 0 ≤ i < r. It is clear that if n is a strong pseudoprime to the base a, then n is also a pseudoprime to the base a. What is less evident but still true is that if n is a strong pseudoprime to the base a, then n is also an Euler pseudoprime to the base a.

The rationale behind this definition is the following. If for some we have an–1 ≢ 1 (mod n), we conclude with certainty that n is composite. So assume that an–1 ≡ 1 (mod n) and consider the powers bi := a2in (mod n) for i = 0, 1, . . . , r to see how the sequence b0, b1, . . . eventually reaches br ≡ 1 (mod n). If b0 ≡ 1 (mod n) already, this dynamics is clear. If, on the other hand, we have an i such that bi ≢ 1 (mod n), whereas bi+1 ≡ 1 (mod n), then bi is a square root of 1 modulo n. If n is a prime, the only square roots of 1 modulo n are ±1 and so n must be a strong pseudoprime to the base a. On the other hand, if n is composite but not the power of a prime, then 1 has at least two non-trivial square roots (that is, square roots other than ±1) modulo n (Exercise 3.30). We hope to find one such non-trivial square root of 1 in the sequence b0, b1, . . . , br–1 and if we are successful, the compositeness of n is proved with certainty.

A complete residue system modulo an odd composite n contains at most n/4 bases to which n is a strong pseudoprime. The proof of this fact is somewhat involved (though elementary) and can be found elsewhere, for example, in Chapter V of Koblitz [153]. Here, we concentrate on the Monte Carlo Algorithm 3.13 known as the Miller–Rabin primality test and based on this observation.

Algorithm 3.13. Miller–Rabin primality test

Input: An odd integer and an acceptable probability δ of failure.

Output: A certificate that either “n is composite” or “n is prime”.

Steps:

Find out n′ and r such that n – 1 = 2rn′ with  and n′ odd.
Determine the number t of iterations, so that the probability of failure is ≤ δ.
for (j = 1, . . . , t) {
   Choose a random base a, 1 < a < n.
   b := an  (mod n).   /* Compute b0 */
   if (b ≢ 1 (mod n)) {
      i := 0.
      while (i < r – 1) and (b ≢ –1 (mod n)) {
         i++, b := b2 (mod n).    /* Compute bi by squaring bi–1 */
         if (b ≡ 1 (mod n)) { Return “n is composite”. }
      }
      if (b ≢ –1 (mod n)) { Return “n is composite”. }
   }
}
Return “n is prime”.

Whenever Algorithm 3.13 outputs n is composite, it is correct. On the other hand, if it certifies n as prime, there is a probability δ that n is composite. This probability can be made very small by choosing a suitably large value of the iteration count t. For cryptographic applications, δ ≤ 1/280 is considered sufficiently safe. In view of the first statement of the last paragraph, we can take t = 40 to meet this error bound. In practice, much smaller values of t offer the desired confidence. For example, if n is of bit length 250, 500, 750 or 1000, the respective values t = 12, 6, 4 and 3 suffice.

Although, in Algorithm 3.13, we have chosen a to be an arbitrary integer between 2 and n – 2, there is apparently no harm, if we choose a randomly in the interval 2 ≤ a < 232. In fact, such a choice of single-precision bases is desirable, because that makes the exponentiation an (mod n) more efficient (See Algorithm 3.9). A typical cryptographic application loads at start-up a precalculated table of small primes (say, the first thousand primes). Choosing the bases randomly from this list of small primes is indeed a good idea.

Deterministic primality proving

While the Miller–Rabin algorithm settles the primality testing problem in a practical sense, it is, after all, a randomized algorithm. It is interesting, at the minimum theoretically, to investigate the deterministic complexity of primality testing. There has been a good amount of research in this line. Let us sketch here the history of deterministic primality proving, without going to rigorous mathematical details.

One natural strategy to check for the primality of a positive integer n is to factor it. However, factoring integers is a computationally difficult problem. Primality proving has been found to be a much easier computational exercise. That is, one need not factorize n explicitly in order to claim about the primality of n.

The (seemingly) first modern primality testing algorithm is due to Miller[204]. This algorithm is deterministic polynomial-time, provided that the extended Riemann hypothesis or ERH (Conjecture 2.3) is true. Since the ERH is still an unsolved problem in mathematics, it cannot be claimed with certainty if Miller’s test is really a polynomial-time algorithm. Rabin [248] provided a version of Miller’s test which is unconditionally polynomial-time, but is, at the same time, randomized. This is what we have discussed earlier under the name Miller–Rabin primality test. This is a Monte Carlo algorithm which produces the answer no (composite) with certainty, but the answer yes (prime) with some (small) probability of error. Solovay and Strassen’s test [287] based on Definition 3.3 is another no-biased randomized polynomial-time primality test and can be made deterministically polynomial-time under the ERH.

Adleman and Huang [3], using the work of Goldwasser and Kilian [116], provide a yes-biased randomized primality-proving algorithm that runs in expected polynomial time unconditionally. Adleman et al. [4] propose the first deterministic algorithm that runs unconditionally in time less than fully exponential (in log n). Its (worst) running time is (ln n)O(ln ln ln n), which is still not polynomial. (The exponent ln ln ln n grows very slowly with n, but still is not a constant.)

In August 2002, Agarwal, Kayal and Saxena came up with the first deterministic primality testing algorithm that runs in polynomial time unconditionally, that is, under no unproven assumptions. This algorithm, popularly abbreviated as the AKS algorithm, is based on the observation that n is prime if and only if (X + a)nXn + a (mod n) for every (Exercise 3.26). A naive application of this observation requires computing an exponential number of coefficients in the binomial expansion of (X + a)n. The AKS algorithm gets around with this difficulty by checking the new congruence

Equation 3.2


for some polynomial h(X) of small degree. Here the notation (mod n, h(X)) means modulo the ideal of . If deg h(X) is bounded by a polynomial in log n, (X + a)n (and also Xn + a) can be computed modulo n, h(X) in polynomial time. However, reduction modulo h(X) may allow a composite n to satisfy the new congruence. Agarwal et al. took h(X) := Xr –1 for some prime r = O(ln6 n) with r – 1 having a prime divisor ln n. From a result in analytic number theory due to Fouvry, such a prime r always exists. Congruence (3.2) is verified for this h(X) and for at most ln n values of a. An elementary proof presented in Agarwal et al. [5] demonstrates that this suffices to conclude deterministically and unconditionally about the primality of n. The AKS algorithm in this form runs in time O~(ln12 n).

Lenstra and Pomerance [175] have reduced the running time of the AKS algorithm to O~(ln6 n). The AKS paper comes with another conjecture which, if true, yields a O~(ln3 n) deterministic primality-proving algorithm.

Conjecture 3.1. AKS conjecture

Let n be an odd integer > 1, and with rn. If

(X – 1)nXn – 1 (mod n, Xr – 1),

then either n is prime or n2 ≡ 1 (mod r).

It remains an open question whether a future version of the AKS algorithm would supersede the Miller–Rabin test in terms of performance. As long as the answers are not favourable to the AKS algorithm, these new theoretical endeavours do not seem to have sufficient impacts on cryptography. Primes certified by the Miller–Rabin test are at present secure enough for all applications. Nonetheless, the AKS breakthrough has solid theoretical implications and deserves mention in a prime context.

3.4.2. Generating Random Primes

If a random prime of a given bit length t is called for, we can keep on generating random odd integers of bit length t and check these integers for primality using the Miller–Rabin test. The prime number Theorem 2.20 ascertains that after O(t) iterations we expect to find a prime. A somewhat similar but reasonably faster algorithm is discussed in Exercise 4.14. We will henceforth call random primes of a given bit length and having no additional imposed properties as naive primes. Naive primes are often not cryptographically secure, because the primes used in many protocols should satisfy certain properties in order to preclude some known cryptanalytic attacks.

Definition 3.5.

Let p be an odd prime. Then p is called a safe prime, if (p – 1)/2 is also a prime, whereas p is called a strong prime, if

  1. p – 1 has a large prime divisor, say, q,

  2. p + 1 has a large prime divisor, say, q′, and

  3. q – 1 has a large prime divisor, say, q″.

In cryptography, a large prime divisor typically refers to one with bit length ≥ 160.

A random safe prime of a given bit length t can be found by generating a random sequence of natural numbers n congruent to 3 modulo 4 and of bit length t, until one is found for which both n and (n – 1)/2 are primes (as certified by the Miller–Rabin primality test). The prime number theorem once again implies that this search is expected to terminate after O(t2) iterations.

For generating a random strong prime p of bit length t, we first generate q′ and q″ and then q and finally p. (See the notations of Definition 3.5.) Algorithm 3.14 describes Gordon’s algorithm in which the bit lengths l and l′ of q and q′ are nearly t/2 and the bit length l″ of q″ is slightly smaller than l′. In our concrete implementation of the algorithm, we choose l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20 and l″ := ⌈t/2⌉ – 22. If t is sufficiently large (say, t ≥ 400), the prime divisors q, q′ and q″ are then cryptographically large.

The simple check that Gordon’s algorithm correctly computes a strong prime of bit length t with q, q′ and q″ as in Definition 3.5 is based on Fermat’s little theorem and is left to the reader. Note that with our choice of l, l′ and l″, the loop variables i and j run through single-precision values only, thereby making arithmetic involving them efficient. Also note that the ranges over which i and j vary are sufficiently large so that we expect the (outer) while loop to be executed only once. This implementation has a tendency to generate smaller values of q and p (with the given bit sizes). In practice, this is not a serious problem and can be avoided, if desired, by choosing random values of i and j from the indicated ranges.

Algorithm 3.14. Gordon’s strong-prime generator

Input: , t ≥ 400.

Output: A strong prime p of bit length t.

Steps:

l := ⌈t/2⌉ – 2, l′ := ⌊t/2⌋ – 20, l″ := ⌈t/2⌉ – 22.

while (1) {
    Find a (random) naive prime q′ of bit length l′.
    Find a (random) naive prime q″ of bit length l″.
    for (i = ⌈(2l–1 – 1)/2q″⌉, . . . , ⌊(2l – 2)/2q″⌋) {                 /* Search for q */
       q := 2iq″ + 1.
       if (q is prime) {
          p′ := 2((q′)q – 2 mod q)q′ – 1.
          for (j = ⌈(2t–1 – p′)/2qq′⌉, . . . , ⌊(2t – 1 – p′)/2qq′⌋) {     /* Search for p */
             p := p′ + 2jqq′.
             if (p is prime) { Return }
          }
       }
    }
}

Gordon’s algorithm takes only nominally more expected running time than that needed by the algorithm discussed at the beginning of Section 3.4.2 for generating naive primes of the same bit length. On the other hand, safe primes are much costlier to generate and may be avoided, unless the situation specifically demands their usage.

3.4.3. Modular Square Roots

Determination of square roots modulo a prime p is frequently needed in cryptographic applications. In this section, we assume that p is an odd prime and want to compute the square roots of , gcd(a, p) = 1, modulo p, provided that a is a quadratic residue modulo p, that is, if . Using the Jacobi symbol the value can be computed efficiently as Algorithm 3.15 suggests.

The correctness of Algorithm 3.15 follows from the properties of the Jacobi symbol (Proposition 2.22 and Theorem 2.19). The value of (–1)(b2–1)/8 is determined by the value of b modulo 8, that is, by the three least significant bits of b:

Similarly, (–1)(a – 1)(b – 1)/4 can be computed using only the second least significant bits of a and b as:

If , our next task is to compute with x2a (mod p). If one such x is found, the other square root of a modulo p is –xpx (mod p). If p ≡ 3 (mod 4) or p ≡ 5 (mod 8), we have explicit formulas for a square root x. The remaining case, namely p ≡ 1 (mod 8), is somewhat complicated. In this case, we use the probabilistic algorithm due to Tonelli and Shanks. The details are given in Algorithm 3.16. The explicit formulas for the first two cases are easy to verify. We now prove the correctness of the algorithm in the remaining case.

Algorithm 3.15. Computation of the Legendre symbol

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: The Legendre symbol .

Steps:

b := p, k := 1./* Initialize */

/* The Euclidean loop */

Since is cyclic and has order p – 1 = 2vq, the 2-Sylow subgroup G of has order 2v and is also cyclic. Let g be a generator of G. By Euler’s criterion, aq is a square in G and, therefore, aqge = 1 (in G) for some even integer e, 0 ≤ e < 2v, and xa(q + 1)/2ge/2 (mod p) is a square root of a modulo p.

A generator g of G can be obtained by choosing random elements b from and computing the Legendre symbol . It is easy to see that . Furthermore, bq is a generator of G if and only if . Finding a quadratic non-residue in is the probabilistic part of the algorithm. Since exactly half of the elements of are quadratic non-residues, one expects to find one after a few random trials. In order to make the exponentiation bq efficient, b should be chosen as single-precision integers. The while loop of the algorithm computes the multiplier ge/2 in x using O(v) iterations by successively locating the 1 bits of e starting from the least significant end.

To sum up, square roots modulo a prime can be computed in probabilistic polynomial time. Computing square roots modulo a composite integer n is, on the other hand, a very difficult problem, unless the complete factorization of n is known (see Section 4.2 and Exercise 3.29).

Exercise Set 3.4

3.19Let be odd and composite and suppose that there exists (at least) one with an–1 ≢ 1 (mod n). Show that bn–1 ≢ 1 (mod n) for at least half of the bases . [H]

Algorithm 3.16. Modular square root

Input: An odd prime p and an integer a, 1 ≤ a < p.

Output: A square root of a modulo p (if existent).

Steps:

if { Returna does not have a square root modulo p”. }

if (p ≡ 3 (mod 4)) { Return (mod p). }

if (p ≡ 5 (mod 8))
   if  { Return  (mod p) }
   else { Return  (mod p). }

/* The case p ≡ 1 (mod 8) */
v := v2(p – 1), q := (p – 1)/2v.    /* q is odd */
Find a random quadratic non-residue b modulo p and set g := bq (mod p).
x := a(q + 1)/2 (mod p).
Precompute a–1 (mod p).
while (1) {
   find the smallest  for which (x2a–1)2i ≡ 1 (mod p).
   if (i = 0) { Return x. }
   x := xg2vi–1 (mod p).
}

3.20Let be odd and composite.
  1. Show that there exists , such that (mod n). [H]

  2. Show that (mod n) for at least half of the bases . [H]

3.21Let be a Carmichael number, that is, a composite integer for which an–1 ≡ 1 (mod n) for all a coprime to n, that is, ordn(a)|(n – 1) for all . Prove that:
  1. (p – 1)|(n – 1) for every prime divisor p of n. [H]

  2. n is odd. [H]

  3. n is square-free. [H]

  4. n has at least three distinct prime divisors.

3.22
  1. Let be a square-free composite integer, such that (p – 1)|(n – 1) for every prime divisor p of n. Show that n is a Carmichael number.

  2. Demonstrate that 561 = 3 × 11 × 17; 2,821 = 7 × 13 × 31; and 172,081 = 7 × 13 × 31 × 61 are Carmichael numbers.

  3. Assume that for some the integers p1 := 6k + 1, p2 := 12k + 1 and p3 := 18k + 1 are prime. Prove that p1p2p3 is a Carmichael number.

  4. Deduce that 1,729 = 7 × 13 × 19 and 294,409 = 37 × 73 × 109 are Carmichael numbers.

3.23

Fermat’s test for prime numbers Let and let , , be the prime factorization of n – 1. Suppose that there exist integers a1, . . . , ar such that for each i we have (mod n) and (mod n). Show that n is prime.

3.24

Pépin’s test for Fermat numbers Show that the Fermat number n := 22k + 1 is prime if and only if 3(n – 1)/2 ≡ –1 (mod n).

3.25Write an algorithm that, given natural numbers t, l with l < t, outputs a (probable) prime p of bit length t such that p – 1 has a (probable) prime divisor q of bit length l.
3.26Let .
  1. Show that the ring is (canonically) isomorphic to the ring . In view of this, we write f(X) ≡ g(X) (mod n) to mean either that the coefficients of f are congruent modulo n to the respective coefficients of g or that the polynomials f(X) and g(X) are congruent modulo the principal ideal of generated by n.

  2. Prove that if n is a prime, then (X + a)nXn + a (mod n) for every .

  3. Prove that for composite n there exists , 1 < k < n, with . Deduce that in this case (X + a)nXn + a (mod n) for some .

  4. Let and let be the canonical image of h(X) in . Show that the ring is isomorphic to the ring .

3.27Modify Algorithm 3.15 to compute the (generalized) Jacobi symbol for odd and for arbitrary .
3.28A Implement the Chinese remainder theorem for integers, that is, write an algorithm that takes as input pairwise relatively prime moduli and integers for i = 1, . . . , r and that outputs with aai (mod ni) for all i = 1, . . . , r. [H]
3.29Let f(X) be a non-constant polynomial in .
  1. Let the congruence f(x) ≡ 0 (mod pe), , have a solution xa (mod pe). Show that if an integer a′ := a + kpe solves the congruence f(x) ≡ 0 (mod pe + 1), then k satisfies the congruence

    f′(a)k ≡ –f(a)/pe (mod p).

    Here f(a)/pe means integer division. Demonstrate that this congruence may have 0, 1 or p solutions (for k) depending on the values of f′(a) and f(a)/pe. Each such k gives a solution a′ of f(x) ≡ 0 (mod pe + 1) with a′ ≡ a (mod pe). We say that the solution a′ (modulo pe + 1) is obtained from the solution a (modulo pe) by (Hensel) lifting.

  2. Lifting together with the Chinese remainder theorem allow us to reduce the problem of solving a polynomial congruence modulo an arbitrary modulus to the problem of solving the same congruence modulo the prime divisors of n. More precisely, if the prime factorization of n and all the solutions of the congruences f(x) ≡ 0 (mod pi) for all i = 1, . . . , r are given, design an algorithm to compute all the solutions of the congruence f(x) ≡ 0 (mod n).

3.30Let be odd and . Deduce that the congruence x2a (mod n) has exactly solutions modulo n.
3.31Show that Algorithm 3.17 correctly computes for . Specify a strategy to initialize a before the while loop. Determine how Algorithm 3.17 can be used to check if a given is a perfect square. [H]
Algorithm 3.17. Integer square root

Input: .

Output: .

Steps:

Using bit operations initialize a to an integral value x.
while (1) {    /* Newton’s iteration loop */
   b := ⌊(a + ⌊n/a⌋)/2⌋.
   if (a ≤ b) { Return a. }
   a := b.
}

3.32
  1. Design an algorithm that, given n, , computes . [H]

  2. Design an algorithm to check if a given is an integral power of another integer.

3.5. Arithmetic in Finite Fields

Many cryptographic protocols are based on the (apparent) intractability of the discrete logarithm problem (Section 4.2) in the multiplicative group of a finite field . The arithmetic of the finite fields , , and , , is easy to implement and run efficiently. In view of this, these two kinds of finite fields are most popular in cryptography and we concentrate our algorithmic study on these fields only.

A prime field is the quotient ring . In Section 3.3.4, we have already made a thorough study of the arithmetic of the rings , . We recall that the elements of are represented as integers from the set {0, 1, . . . , p – 1} and the arithmetic in is the modulo p integer arithmetic. Since p is typically multiple-precision, the characteristic p of is odd. The fields of even characteristic that we will study are the non-prime fields .

Section 2.9.3 explains several representations of extension fields. The most common one is the polynomial-basis representation for an irreducible polynomial f(X) of degree n in . In that case, an element of has the canonical representation as a polynomial a0 + a1X + · · · + an–1Xn–1, , of degree < n. An arithmetic operation on two elements of is the same operation in followed by reduction modulo the defining polynomial f(X). So we start with the implementation of the polynomial arithmetic over .

3.5.1. Arithmetic in the Ring

A polynomial over (or any field) is identified by its coefficients of which only finitely many are non-zero. Thus for storing a polynomial g(X) = adXd + ad–1Xd–1 + · · · + a1X + a0 it is sufficient to store the finite ordered sequence adad–1 . . . a1a0. It is not necessary to demand ad ≠ 0, but the shortest sequence representing a non-zero polynomial corresponds to ad ≠ 0 and in this case deg g = d. On the other hand, as we see later it is often useful to pad such a sequence with leading zero coefficients. As an example, the polynomial is representable as 101 or as 0101 or as 00101 or · · ·.

Since can be viewed as the set {0, 1} with operations modulo 2, a polynomial in is essentially a bit string unique up to insertion (and deletion) of leading zero bits. As in the case of multiple-precision integers, we pack these coefficients in an array of 32-bit words and maintain the number of coefficients belonging to the polynomial. For example, the polynomial g(X) = X64 + X31 + X7 + 1 can be stored in an array w2w1w0 of three 32-bit words. w0 consists of the coefficients of X0, X1, . . . , X31, w1 consists of the coefficients of X32, X33, . . . , X63, and w2 consists of the coefficient of X64. It is up to the implementation scheme to decide whether the coefficients are to be stored from left to right or from right to left in the bits of a word. We assume that less significant coefficients go to the less significant bits of a word. For the polynomial g above, the word w0 viewed as an unsigned integer will then be w0 = 231 + 27 + 1, whereas we have w1 = 0. The least significant bit of w2 would be 1. The remaining 31 bits of w2 are not important and can be assigned any value as long as we maintain the information that only the coefficients of Xi, 0 ≤ i ≤ 64, need to be considered. On the other hand, if we want to store the coefficients of g upto that of X80, then the bits of w2 at locations 1, . . . , 16 must be zero, whereas those at locations 17, . . . , 31 may be of any value. We, however, always recommend the use of leading zero-bits to fill the portion of the leading word not belonging to the polynomial.

Such a representation of elements of , in addition to being compact, facilitates efficient implementation of arithmetic functions. As we will shortly see, we need not often extract the individual coefficients of a polynomial but apply bit operations on entire words to process 32 coefficients simultaneously per operation. We usually do not need polynomials of degrees > 4096 for cryptographic applications. It is, therefore, sufficient to declare a static array capable of storing all the 8193 coefficients of a product of two such largest polynomials. The zero polynomial may be represented as one with zero word size, whereas the degree of the zero polynomial is taken to be –∞ which may be representable as –1.

We now describe the arithmetic functions on two non-zero polynomials

Equation 3.3


Under our implementation, a and b demand ρ := ⌈(r + 1)/32⌉ and σ := ⌈(s + 1)/32⌉ machine words αρ – 1 . . . α1α0 and βσ – 1 . . . β1β0. We also assume paddings with leading zero bits in the areas not belonging to the operands.

Note that the addition of is the same as the XOR (⊕) of two bits. Applying this bit operation on words αi and βi adds 32 coefficients of the operand polynomials simultaneously (see Algorithm 3.18). Finally note that –1 = 1 in any field of characteristic 2, that is, subtraction is the same as addition in such a field.

The product a(X)b(X) can be computed as in Algorithm 3.19. Once again, using wordwise operations yields faster implementation. By AND and OR, we denote the bit-wise and and or operations on 32-bit words. The easy verification of the correctness of this algorithm is left to the reader. As in the case of addition, one might want to make the polynomial c compact after its words γτ – 1, . . . , γ0 are computed.

Algorithm 3.18. Polynomial addition

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) + b(X) (to be stored in the array γτ – 1 . . . γ1γ0).

Steps:

τ := max(ρ, σ).
for (i = 0, . . . , min(ρ, σ) – 1) γi := αi ⊕ βi.
if (ρ > σ) for (i = σ, . . . , ρ – 1) γi := αi,
else if (ρ < σ) for (i = ρ, . . . , σ – 1) γi := βi.
while (τ > 0) and (γτ – 1 = 0) τ – –.       /* Make c compact (optional) */

Algorithm 3.19. Polynomial multiplication

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X)b(X) (to be stored in the array γτ – 1 . . . γ1γ0).

Steps:

τ := ρ + σ – 1.     /* The size of the product */
for (i = 0, . . . , τ – 1) γi := 0.     /* Initialize the product */

/* The quadratic multiplication loop */
for (k = 0, . . . , 31) {    /* For each bit position in a word */
   for (j = 0, . . . , σ – 1) {     /* For each word of b */
      if (bj AND 2k) {     /* if the k-th bit of bj is 1 */
         for (i = 0, . . . , ρ – 1) {    /* For each word of a */
            set γi + j := γi + j ⊕ (ai ≪ kand γi + j + 1 := γi + j + 1 ⊕ (ai ≫ (32 – k)).
         }
      }
   }
}

The square of can be computed very easily using the fact that

a(X)2 = (arXr + · · · + a1X + a0)2 = arX2r + · · · + a1X2 + a0.

This gives us a linear-time (in terms of r or ρ) algorithm instead of the quadratic general-purpose multiplication Algorithm 3.19. We leave the implementational details to the reader.

Division with remainder in is implemented in Algorithm 3.20. As before, we continue to work with the operands a(X) and b(X) as in Equation (3.3). But now we make a further assumption that bs = 1, so that βσ–1 ≠ 0, and also that sr. When the Euclidean division loop of Algorithm 3.20 terminates, the array locations δσ–1, . . . , δ1, δ0 contain the remainder. The arrays γ and δ may be made compact to discard the leading zero bits, if any.

Algorithm 3.20. Euclidean division of polynomials

Input: a(X), as in Equation (3.3).

Output: c(X) = a(X) quot b(X) (to be stored in the array γτ – 1 . . . γ1γ0) and d(X) = a(X) rem b(X) (to be sored in the array δρ–1 . . . δ1δ0).

Steps:

τ := ⌈(s – r + 1)/32⌉.    /* The size of the quotient */
for i = 0, . . . , τ – 1 { γi := 0 }    /* Initialize c(Xto 0 */

for i = 0, . . . , ρ – 1 { δi := αi }   /* Copy a(Xto d(X*/

/* Euclidean division loop */
for i = r, . . . , s {
   if (the coefficient of Xi in d(Xis 1) {
       j := (i – s) quot 32, k := (i – s) rem 32.

       /* Set the coefficient of Xis of c(X*/
       γj := γj OR 2k.

       /* Update d(X) := d(X) – Xisb(X*/
       for l = 0, . . . , σ – 1 {
          δl + j := δl + j ⊕ (bl ≪ k).
          δl + j + 1 := δl + j + 1 ⊕ (bl ≫ (32 – k)).
       }
    }
}

Computing modular inverses requires computation of extended gcds of polynomials in . We again start with the non-zero polynomials a(X), and compute polynomials d(X), u(X) and v(X) in with d(X) = gcd(a(X), b(X)) = u(X)a(X) + v(X)b(X), deg u < deg b and deg v < deg a. For polynomials, we do not have an equivalent of the binary gcd algorithm (Algorithm 3.8). We use repeated Euclidean divisions instead.

The proof for the correctness of Algorithm 3.21 is similar to that for Algorithm 3.8. Here, we introduce the variables rk, Uk and Vk for k = 0, 1, 2, . . . . The initialization goes as: r0 := a, r1 := b, U0 := 1, U1 := 0, V0 := 0 and V1 := 1. During the k-th iteration (k = 1, 2, . . .), we first use Euclidean division to get rk–1 = qkrk + rk + 1 which gives rk + 1 = rk–1qkrk. We also compute Uk + 1 = Uk–1qkUk and Vk + 1 = Vk–1qkVk using the values available from the previous two iterations so as to maintain the relation rk + 1 = Uk + 1r0 + Vk + 1r1 for all k = 1, 2, . . . . In Algorithm 3.21, the k-th iteration of the while loop begins with x = rk–1, y = rk, u1 = Uk and u2 = Uk–1 and ends after updating the values to x = rk, y = rk + 1, u1 = Uk + 1 and u2 = Uk. It is not necessary to maintain the values Vk in the main loop. After the loop terminates, one computes Vk = (rkUkr0)/r1.

Modular arithmetic in is very much similar to the modular arithmetic in . If f(X) is a non-constant polynomial of (not necessarily irreducible), we represent elements of as polynomials in of degrees < n. Given two such polynomials a and b, we compute the sum a + b simply as the sum in . The product ab is computed by first computing the product ab in and then computing the remainder of Euclidean division of this product by f. Inverse of a modulo f exists if and only if gcd(a, f) = 1 (in ). In that case, extended gcd computation gives us polynomials u, v such that 1 = ua + vf, so that ua ≡ 1 (mod f). If a ≠ 0, then Algorithm 3.21 computes u with deg u < deg f = n, so that we take this u to be the canonical representative of a–1 in . Finally, for the computation of the modular exponentiation ae (mod f) can be done using an algorithm very similar to Algorithm 3.9 or Algorithm 3.10. We leave the details to the reader.

Algorithm 3.21. Extended gcd of polynomials

Input: Nonzero polynomials a, .

Output: Polynomials d, u, satisfying

d = gcd(a, b) = ua + vb, deg u < deg b, deg v < deg a.

Steps:

/* Initialize */
x := ay := bu1 := 1, u2 := 0.

/* Repeated Euclidean division */
while (y ≠ 0) {
   Simultaneously compute q := x quot y and r := x rem y (Algorithm 3.20).
   u := u2 – qu1u2 := u1u1 := u,
   x := yy := r.
}
d := xv := (d – ua)/b.

3.5.2. Finite Fields of Characteristic 2

For the polynomial basis representation , we need an irreducible polynomial of degree n. We shortly present a probabilistic algorithm that generates a random monic irreducible polynomial in of given degree . Although we are interested only in the case q = 2, this algorithm holds even if q is any arbitrary prime or an arbitrary prime power.

First, we describe a deterministic polynomial-time algorithm for checking the irreducibility of a non-constant polynomial (over ). If f is reducible, it has a factor of degree i ≤ ⌊n/2⌋. Also recall (Theorem 2.40, p 82) that XqiX is the product of all monic irreducible polynomials of of degrees dividing i. Therefore, if f has an irreducible factor of degree i, then gcd(f, XqiX) = gcd(f, XqiX rem f) will be a non-constant polynomial. Algorithm 3.22 employs these simple observations.

Now, recall from Section 2.9.2 that a random monic polynomial of of degree n is irreducible with probability approximately 1/n. Therefore, if we keep on checking for irreducibility random monic polynomials in of degree n, then after O(n) checks we expect to find an irreducible polynomial. This leads to the Las Vegas probabilistic Algorithm 3.23.

Algorithm 3.22. Check for irreducibility of a polynomial

Input: A non-constant polynomial .

Output: A (deterministic) certificate whether f is irreducible or not.

Steps:

n := deg fg := X.
for i = 1, . . . , ⌊n/2⌋ {
   g := gq (mod f).   /* Here g = Xqi rem f */
   if (deg(gcd(fg – X)) > 0) { Return “f is reducible”. }
}
Return “f is irreducible”.

Algorithm 3.23. Generation of a random irreducible polynomial

Input: , n ≥ 2.

Output: A random monic irreducible polynomial of degree n.

Steps:

while (1) {
   f := a random monic polynomial in  of degree n.
   if (f is irreducible) { Return }
}

Once the defining irreducible polynomial f is available, we carry out the arithmetic in as modular polynomial arithmetic with respect to the modulus f. This is described at the end of Section 3.5.1. Since this modular arithmetic involves taking the remainder of Euclidean division by f, it is sometimes expedient to choose f to be an irreducible polynomial of certain special types. The randomized algorithm described above gives a random monic irreducible polynomial f of degree n having on an average ≈ n/2 non-zero coefficients. The division algorithm (Algorithm 3.20) in that case takes time O(n2). On the other hand, if f is a sparse polynomial (like a trinomial), the Euclidean division loop can be rewritten to exploit this sparsity, thereby bringing down the running time of the division procedure to O(n). (See Exercise 3.34. Also see Exercise 3.38 for computing isomorphisms between different polynomial-basis representations of the same field.)

Let p be a prime and let . We have seen how to implement arithmetic in and hence by Exercise 3.35 that in too. If is an irreducible polynomial of degree n and if q = pn, then and we implement the arithmetic of as the polynomial arithmetic of modulo f. Again by Exercise 3.35, this gives us the arithmetic of . Now, for and a monic irreducible polynomial we have a representation . Instead of having such a two-way representation of we may also represent as , where is a monic irreducible polynomial of degree nm. It usually turns out that the second representation of is more efficient. However, there are some situations where the two-way representation performs better. This is, in particular, the case when the arithmetic of can be made more efficient than the modular polynomial arithmetic of . For example, we might precompute tables of arithmetic operations of and use table lookups for performing the coefficient arithmetic of . This demands O(q2) storage and is feasible only when q is small. On the other hand, if we find a primitive element γ of and precompute a table that maps i ↦ γi and another that maps γii, then products in can be computed in time O(1) using table lookups. If, in addition, we store the Zech’s logarithm table (Section 2.9.3) for , then addition in can also be performed in O(1) time with table lookup. Both these three tables take O(q) memory which (though better than O(q2) storage of the previous scheme) is feasible only for small q.

3.5.3. Selecting Suitable Finite Fields

Not all finite fields are suitable for cryptographic applications. In this section, we discuss the desirable properties of a field so that secured protocols on can be developed. We first note that such protocols are usually based on the apparent intractability of the so-called discrete logarithm problem (DLP) (Section 4.2). As a result, selections of suitable fields are dictated by the known cryptanalytic algorithms to solve the DLP (See Section 4.4). We shall mostly concentrate on with either q = p a prime or q = 2n for some . By the bit size of q, denoted |q|, we mean the number of bits in the binary representation of q, that is, |q| = ⌈lg q⌉. As we have seen, each element of is representable using O(|q|) bits and, therefore, |q| is often also called the size of .

The first requirement on a cryptographically suitable field is that the size |q| should be sufficiently large. Recent cryptanalytic studies show that sizes |q| ≤ 512 are not secure enough. Sizes |q| ≥ 768 are recommended for secure applications. For long-term security, one might even require |q| ≥ 2048.

Any field of the recommended size is, however, not adequately secure. The cardinality #Fq = q must be such that q – 1 has at least one large prime divisor q′ (See the Pohlig–Hellman method in Section 4.4). By large, we usually mean |q′| ≥ 160. In addition, this prime factor q′ of q – 1 should be known to us. If q = p is a prime, then a safe prime or a strong prime serves our purpose (Definition 3.5, Algorithm 3.14). Also see Exercise 3.25. On the other hand, if q = 2n, the only way to obtain q′ is by factorizing the Mersenne number Mn := q – 1 = 2n – 1. Factorizing Mn for n ≥ 768 is a very difficult task. Luckily, extensive tables of complete or partial factorizations of Mn are available. For example, for n = 769 (a prime number), we have

M769 = 2769 – 1 = 1,591,805,393 × 6,123,566,623,856,435,977,170,641 × q′,

where q′ is a 657-bit prime. These tables should be consulted for choosing a suitable value of n.

The multiplicative group is cyclic (Theorem 2.38). If the complete integer factorization of q – 1 is known, then it is possible to find, in polynomial time (in |q|), a primitive element of . Algorithm 3.24 computes r = O(lg n) exponentiations in G in order to conclude whether a given element is a generator of G. For , we have polynomial-time exponentiation algorithms, so Algorithm 3.24 runs in deterministic polynomial time. By Exercise 2.47, the probability of a randomly chosen element of G being primitive is φ(m)/m. In view of the lower bound on φ(m)/m, given in Theorem 3.1 and proved by Rosser and Schoenfield [253], Algorithm 3.25 is expected to return a random primitive element of G after O(ln ln m) iterations.

Theorem 3.1.

Let , m ≥ 5. Then φ(m)/m ≥ 1/(6 ln ln m).

Algorithm 3.24. Check for primitive element

Input: A cyclic group G of cardinality #G = m with known factorization and an element .

Output: A deterministic certificate that a is a generator of G.

Steps:

/* We assume that G is multiplicatively written and has the identity e */
for i = 1, . . . , r {
   if (a(n–1)/pi = e) { Return “a is not a generator of G”. }
}
Return “a is a generator of G”.

Algorithm 3.25. Computation of a generator of a finite cyclic group

Input: A cyclic group G of cardinality #G = m with known factorization .

Output: A generator g of G.

Steps:

while (1) {
    g := a random element of G.
    if (g is a generator of G) /* Algorithm 3.24 */ { Return }
}

If, however, the factorization of #G = m is not known, there are no known (deterministic or probabilistic) algorithms for finding a random generator of G or even for checking if a given element of G is primitive. This is indeed one of the intractable problems of computational algebraic number theory. This problem for can be bypassed as follows.

Recall that we have chosen q in such a way that has a large known prime factor q′. Let H be the unique subgroup of G of order q′. Then H is also cyclic and we choose to work in H (using the arithmetic of G). It turns out that if q′ ≥ 2160 and if H is not contained in a proper subfield of , the security of cryptographic protocols over does not degrade too much by the use of H (instead of the full G) as the ground group. But we now face a new problem, that is, the problem of finding a generator of H. Since #H = q′ is a prime, every element of H \ {1} is a generator of H. So the problem essentially reduces to that of finding any non-identity element of H. This latter problem has a simple probabilistic solution. First of all, if q – 1 = q′ is itself prime, choosing any random non-identity element of will do. So assume q′ < q – 1. Choose a random and let b := a(q – 1)/q. By Lagrange’s theorem (Theorem 2.2, p 24), bq = aq–1 = 1 and, therefore, by Proposition 2.5 . Now, being a field, the polynomial can have at most (q – 1)/q′ roots in (that is, in ) and hence the probability that b = 1 is ≤ ((q – 1)/q′)/(q – 1) = 1/q′. This justifies the randomized polynomial running time of the Las Vegas Algorithm 3.26. Indeed if q′ ≥ 2160, the while loop of the algorithm is executed only once almost always.

Algorithm 3.26. Computation of an element of given order

Input: A finite field and an (odd) prime factor q′ of q – 1 with q′ < q – 1.

Output: An element of multiplicative order q′.

Steps:

while (1) {
   a := a random element of  \ {0, ±1}.
   b := a(q – 1)/q.
   if (b ≠ 1) { Return }
}

3.5.4. Factoring Polynomials over Finite Fields

Polynomial factorization over finite fields is an interesting computational problem. All deterministic algorithms known for this purpose are quite poor: that is, fully exponential in the size of the field. However, if randomization is allowed, we have reasonably efficient (polynomial-time) algorithms. In this section, we outline the basic working of the modern probabilistic algorithms for polynomial factorization over finite fields. We assume that a non-constant polynomial is to be factored. Without loss of generality, we can take f to be monic. We assume further that the arithmetic of and that of is available. We work with a general value of q = pn, p prime and , though in some cases we have to treat the case p = 2 separately. Irreducibility (or otherwise) in this section means the same over .

The factorization algorithm we are going to discuss is a generalization of the root finding algorithm (see Exercise 3.36) and consists of three steps:

Square-free factorization (SFF) Decompose the input polynomial f into a product of square-free polynomials.

Distinct-degree factorization (DDF) Given a square-free polynomial f of degree d, compute f = f1 . . . fd with each fi being a product of irreducible polynomials of degree i.

Equal-degree factorization (EDF) Given a product f of irreducible polynomials of the same degree, find out the irreducible factors of f.

We now provide a separate detailed discussion for each of these three steps.

Square-free factorization

Theorem 3.2 is at the very heart of the square-free factorization algorithm and is a generalization of Exercise 2.61.

Theorem 3.2.

Let K be a field and a non-constant monic polynomial. Then the polynomial f / gcd(f, f′) is square-free, where f′ is the formal derivative of f. In particular, f is square-free if and only if gcd(f, f′) = 1.

Proof

Let be the factorization of f with pairwise distinct monic irreducible polynomials f1, . . . , fr, , with and with . In order to determine vf1(f′), we employ the usual rules for derivatives to get for some . If , then vf1(f′) ≥ α1. Otherwise, vf1(f′) = α1 – 1, since , i > 1. Similar is the case for vfi(f′) for i = 2, . . . , r. It follows that gcd, where each , so that , , is square-free.

The algorithm for SFF over is now almost immediate except for one subtlety, namely, the consideration of the case f/gcd(f, f′) = 1, or equivalently, f′ = 0. In order to see when this case can occur, let us write the non-zero terms of f as f = a1Xe1 + · · · + atXet with distinct exponents e1, . . . , et and . Then f′ = a1e1Xe1 – 1 + · · · + atetXet – 1 = 0 if and only if , that is, if p divides all of e1, . . . , et. But then f(X) = h(X)p, where , since for all i. These observations motivate the recursive Algorithm 3.27. It is easy to check that this (deterministic) algorithm runs in time polynomially bounded by deg f and log q.

Algorithm 3.27. Square-free factorization

Input: A monic non-constant polynomial , q = pn, p prime, .

Output: A square-free factorization of f.

Steps:

Compute f′.
if (f′ = 0) {
    Compute  such that f = hp.
    Recursively compute a SFF h = h1 · · · hs of h.
    Return the SFF of f as f = (h1 · · · hs)(h1 · · · hs) · · · (h1 · · · hs(p times).
else {
    Recursively compute a SFF gcd(ff′) = g1 · · · gs of gcd(ff′).
    Return the SFF of f as f = (f/ gcd(ff′))g1 · · · gs.
}

Distinct-degree factorization

Let be a square-free polynomial of degree d. We can write f = f1 · · · fd, where for each i the polynomial is the product of all the irreducible factors of f of degree i. If f does not have an irreducible factor of degree i, then we take fi = 1 as usual.[5] In order to compute the polynomials fi, we make use of the fact that is the product of all monic irreducible polynomials in whose degrees divide i (see Theorem 2.40 on p 82). It immediately follows that . Thus a few (at most d) gcd computations give us all fi. But the polynomials are of rather large degrees. But since , keeping polynomials reduced modulo f implies that we take gcds of polynomials of degrees ≤ d. This, in turn, implies that the DDF can be performed in (deterministic) polynomial time (in d and ln q).

[5] Conventionally, an empty product is taken to be the multiplicative identity and an empty sum to be the additive identity.

Algorithm 3.28 shows an implementation of the DDF. Though the algorithm does not require f to be monic, there is no harm in assuming so.

Algorithm 3.28. Distinct-degree factorization

Input: A (non-constant) square-free polynomial .

Output: The DDF of f, that is, the polynomials f1, . . . , fd as explained above.

Steps:

g := f.   /* Make a local copy of f */
h = Xi = 1.
while (deg g ≠ 0) {
   h := hq (mod f).   /* Modular exponentiation */
   fi := gcd(h – Xg).
   g := g/fi.    /* Factor out fi from g */
   i++.
}
if (i < d) { fi + 1 := 1, . . . , fd := 1. }

This simple-minded implementation of the DDF is theoretically not the most efficient one known. In fact, it turns out that the DDF (and not the seemingly more complicated EDF) is the bottleneck of the entire polynomial factorization process. Therefore, making the DDF more efficient is important and there are lots of improvements suggested in the literature. All these improved algorithms essentially do the same thing as above (that is, the computation of ), but they optimize the computation of the polynomials rem f. The best-known method (due to Kaltofen and Shoup) is based on the observation that, in general, most of the fi are 1. Therefore, instead of computing each , one may break the interval 1, . . . , d into several subintervals I1, I2, . . . , Il and compute , j = 1, . . . , l. Only those Fj that turn up to be non-constant are further decomposed.

For cryptographic purposes, we will, however, deal with rather small values of d = deg f. (Typically d is at most a few thousands.) The asymptotically better algorithms usually do not outperform the simple Algorithm 3.28 for these values of d.

Equal-degree factorization

Equal-degree factorization, the last step of the polynomial factorization process, is the only probabilistic part of the algorithm. We may assume that f is a (monic) square-free polynomial of degree d and that each irreducible factor of f has the same (known) degree, say δ. If d = δ, then f is irreducible. So we assume that d > δ, that is, d = rδ for some . Theorem 3.3 provides the basic foundations for the EDF.

Theorem 3.3.

Let g be any polynomial in and let . Then XqδX divides gqδg.

Proof

If g = 0, there is nothing to prove. If g = alXl + · · · + a1X + a0 ≠ 0 with , then gqδg = al(XlqδXl) + · · · + a1(XqδX). It is easy to verify that XqδX divides XiqδXi for every .

Now, we have to separate two cases, namely, q is odd and q is even. Theorem 3.3 is valid for any q, even or odd, but taking q as odd allows us to write gqδg = g(g(qδ –1)/2–1)(g(qδ –1)/2 + 1). With the above assumptions on f we have f|(XqδX) and, therefore, f|(gqδg), so that f = gcd(gqδg, f) = gcd(g, f) gcd(g(qδ –1)/2 – 1, f) gcd(g(qδ –1)/2 + 1, f). If g is randomly chosen, then gcd(g(qδ –1)/2 – 1, f) is with probability ≈ 1/2 a non-trivial factor of f. The idea is, therefore, to keep on choosing random g and computing until one gets . One then recursively applies the algorithm to and . It is sufficient to choose g with deg g < 2δ. Obviously, the exponentiation gqδ has to be carried out modulo f. We leave the details to the reader, but note that trying O(1) random polynomials g is expected to split f and, therefore, the EDF runs in expected polynomial time.

For the case q = 2n, essentially the same algorithm works, but we have to use the split gqδ + g = g2nδ + g = (g2nδ–1 + g2nδ–2 + · · · + g2 + g)(g2nδ–1 + g2nδ–2 + · · · + g2 + g + 1). Once again computing gcd(g2nδ–1 + g2nδ–2 + · · · + g2 + g, f) for a random splits f with probability ≈ 1/2 and, thus, we get an EDF algorithm that runs in expected polynomial time.

Exercise Set 3.5

3.33Find a (polynomial-basis) representation of . Compute a primitive element in this representation.
3.34
  1. Show that the running time of Algorithm 3.20 is O(s(rs)) which reaches the maximum order of O(r2) = O(s2), when sr/2.

  2. Suppose b is known to have e non-zero coefficients. Modify the Euclidean division loop of Algorithm 3.20 so that the algorithm runs in time O((rs)e). [H] In particular, if e = O(1), the running time of Algorithm 3.20 becomes linear, namely O(r).

3.35Implement the polynomial arithmetic of given that of .
3.36Let q = pn (p prime and ), a non-constant polynomial and let g := gcd(f, XqX).
  1. If S is the set of all roots of f in , show that . Thus, g is a square-free polynomial which splits over and has the same roots (over ) as f. If deg g = 0 or 1, then we know all the roots of g and hence of f. So, for the rest of this exercise, we assume that deg g ≥ 2.

  2. Consider the case that p is odd. Let be arbitrary. Show that

    (X + b)((X + b)(q–1)/2 – 1)((X + b)(q–1)/2 + 1) = XqX

    and that

    g = gcd(g, X + b) gcd(g, (X + b)(q–1)/2 – 1) gcd(g, (X + b)(q–1)/2 + 1).

    Explain how Algorithm 3.29 produces two non-trivial factors of g (over ) in probabilistic polynomial time. [H] Write an algorithm to compute all the roots of f in .

    Algorithm 3.29. Computing roots of a polynomial: odd characteristic

    Input: A square-free polynomial that splits over .

    Output: Polynomials g1, with g = g1g2 and deg gi ≥ 1 for i = 1, 2.

    Steps:

    if (g(0) = 0) { (g1g2) := (Xg(X)/X), return. }
    while (1) {
      Select a random element .
      h := (X + b)(q–1)/2 – 1 (mod g).
      g1 := gcd(gh).
      if (1 ≤ deg g1 < deg g) { g2 := g/g1return. }
    }

  3. Now, assume that p = 2 and define the polynomial

    Let be arbitrary. Show that

    H(X + b)(H(X + b) + 1) = XqX

    [H] and that

    g(X) = gcd(g(X), H(X + b)) gcd(g(X), H(X + b) + 1).

Explain how Algorithm 3.30 produces two non-trivial factors of g (over ) in probabilistic polynomial time. Write an algorithm to compute all the roots of f in .

Algorithm 3.30. Computing roots of a polynomial: characteristic 2

Input: A square-free polynomial that splits over .

Output: Polynomials g1, with g = g1g2 and deg gi ≥ 1 for i = 1, 2.

Steps:

if (g(0) = 0) { (g1g2) := (Xg(X)/X), return. }
while (1) {
   Select a random element .
   h := (X + b) + (X + b)2 + (X + b)4 + · · · + (X + b)2n–1 (mod g).
   g1 := gcd(gh).
   if (1 ≤ deg g1 < deg g) { g2 := g/g1return. }
}

3.37Use Exercise 3.36 to compute all the roots of the following polynomials:
  1. X6 + 6X4 + 4X2 + 6 in .

  2. X3 + (α2 + α)X2 + (α2 + α + 1) in , where is represented as , α being a root of the polynomial X3 + X + 1.

3.38Let f and g be two monic irreducible polynomials over and of the same degree . Consider the two representations . In this exercise, we study how we can compute an isomorphism between these two representations. The polynomial f(Y) splits into linear factors over . Consider a root α = α(Y) of f(Y) in . Show that 1, α, α2, . . . , αn–1 is an -basis of (the -vector space) . For i = 0, . . . , n – 1, write (uniquely) with , and consider the matrix A = (αij)0≤in–1, 0≤jn–1. Show that the map that maps (the equivalence class of) a0 + a1X + · · · + an–1Xn–1 to (the equivalence class of) b0 + b1Y + · · · + bn–1Yn–1, where (b0b1 . . . bn–1) = (a0a1 . . . an–1)A, is an -isomorphism.
3.39Let q = pn for a prime p and . We have seen that the elements of can be represented as integers between 0 and p – 1, whereas the elements of can be represented as polynomials modulo some irreducible polynomial of degree n, that is, as polynomials of of degrees < n. Show that the substitution X = p in the polynomial representation of elements of gives a representation of elements of as integers between 0 and q – 1. We call this latter representation of elements of the packed representation. Compare the advantages and disadvantages of the packed representation over the polynomial representation.
3.40Let G be a cyclic multiplicatively written group of order m (and with the identity element e). Assume that the factorization of is known. Devise an algorithm that computes the order of an arbitrary element in G. [H]
3.41

Berlekamp’s Q-matrix factorization Let be a monic square-free polynomial of degree d, that admits a factorization f(X) = f1(X) . . . fr(X) with each monic, non-constant and irreducible. (Note that fi are pairwise distinct, since f is square-free.) Let di be the degree of fi.

  1. Consider the ring

    Show that . [H] A is an -vector space of dimension d.

  2. Consider the map that maps x = X + 〈f(X)〉 to xqx. Show that is an -linear transformation with Ker , and so the nullity of equals the number of irreducible factors of f.

  3. Let Q be the matrix of with respect to the basis 1, x, . . . , xd–1. Describe an algorithm to compute Q. Also design an algorithm to compute a basis of Ker .

  4. Show that if , then

    For a suitable h(X), this is a non-trivial factorization of f. This procedure is efficient, when q is small.

  5. Use Berlekamp’s method to factor X6 + X5 + X2 + 1 over .

*3.6. Arithmetic on Elliptic Curves

The recent popularity of cryptographic systems based on elliptic curve groups over stems from two considerations. First, discrete logarithms in can be computed in subexponential time. This demands q to be sufficiently large, typically of length 768 bits or more. On the other hand, if the elliptic curve E over is carefully chosen, the only known algorithms for solving the discrete logarithm problem in are fully exponential in lg q. As a result, smaller values of q suffice to achieve the desired level of security. In practice, the length of q is required to be between 160 and 400 bits. This leads to smaller key sizes for elliptic curve cryptosystems. The second advantage of using elliptic curves is that for a given prime power q, there is only one group , whereas there are many elliptic curve groups (over the same field ) with orders ranging from to . If a particular group is compromised, we can switch to another curve without changing the base field .

In this section, we start with the description of efficient implementation of the arithmetic in the groups . Then we concentrate on some algorithms for counting the order . Knowledge of this order is necessary to find out cryptographically suitable elliptic curves. We consider only prime fields or fields of characteristic 2. So we assume that the curve is defined by Equation (2.8) or Equation (2.9) on p 100 (supersingular curves are not used in cryptography) instead of by the general Weierstrass Equation (2.6) on p 98.

3.6.1. Point Arithmetic

Let us first see how we can efficiently represent points on an elliptic curve E over . Since corresponds to two elements h, and since each element of can be represented using ≤ s = ⌈lg q⌉ bits, 2s bits suffice to represent P. We can do better than this. Substituting X = h in the equation for E leaves us with a quadratic equation in Y. This equation has two roots of which k is one. If we adopt a convention (for example, see Section 6.2.1) that identifies, using a single bit, which of the two roots the coordinate k is, the storage requirement for P drops to s + 1 bits. During an on-line computation this compressed representation incurs some overhead and may be avoided. However, for off-line storage and transmission (of public keys, for example), this compression may be helpful.

Explicit formulas for the sum of two points and for the opposite of a point on an elliptic curve E are given in Section 2.11.2. These operations in can be implemented using a few operations in the ground field .

Computation of mP for and (or, more generally, for ) can be performed using a repeated-double-and-add algorithm similar to the repeated-square-and-multiply Algorithm 3.9. We leave out the trivial modifications and urge the reader to carry out the details.

Finding a random point is another useful problem. If q = p is an odd prime and we use the short Weierstrass Equation (2.8), we first choose a random and substitute X by h to get Y2 = h3 + ah + b. This equation has 2, 0 or 1 solution(s) depending on whether h3 + ah + b is a quadratic residue or non-residue or 0 modulo p. Quadratic residuosity can be checked by computing the Legendre symbol (Algorithm 3.15), whereas square roots modulo p can be computed using Tonelli and Shanks’ Algorithm 3.16.

For a non-supersingular curve E over defined by Equation (2.9), a random point is chosen by first choosing a random . Substituting X = h in the defining equation gives Y2 + hY + (h3 + ah2 + b) = 0. If h = 0, then the unique solution for k is b2n–1. If h ≠ 0, replacing Y by hY and dividing by h2 transforms the equation to the form Y2 + Y + α = 0 for some . This equation has two or zero solutions depending on whether the absolute trace is 0 or 1. If k is a solution, the other solution is k + 1. In order to find a solution (if it exists), one may use the (probabilistic) root-finding algorithm of Exercise 3.36. Another possibility is discussed now.

We consider two separate cases. First, if n is odd, then is a solution, since Tr(α) = k2 + k + α. On the other hand, if n is even, we first find a with Tr(β) = 1. Since Tr is a homomorphism of the additive groups and Tr(1) = 1, exactly half of the elements of have trace 1. Therefore, a desired β can be quickly found out by selecting elements of at random and computing traces of these elements. Now, it is easy to check that gives a solution of Y2 + Y + α = 0.

**3.6.2. Counting Points on Elliptic Curves

Counting points on elliptic curves is a challenging problem, both theoretically and computationally. The first polynomial time (in log q) algorithm invented by Schoof and later made efficient by Elkies and Atkins (and many others), is popularly called the SEA algorithm. Unfortunately, even the most efficient implementation of this algorithm is not quite efficient, but it is the only known reasonable strategy, in particular, when q = p is a large (odd) prime of a size of cryptographic interest. The more recent Satoh–FGH algorithm, named after its discoverer Satoh and after Fouquet, Gaudry and Harley who proposed its generalized and efficient versions, is a remarkable breakthrough for the case q = 2n. Both the SEA and the Satoh–FGH algorithms are mathematically quite sophisticated. We now present a brief overview of these algorithms.

The SEA algorithm

We assume that q = p is a large odd prime, this being the typical situation when we apply the SEA algorithm. We also assume that E is given by the short Weierstrass equation Y2 = X3 + aX + b. Let q1 = 2, q2 = 3, q3 = 5, . . . be the sequence of prime numbers and t the Frobenius trace of E at p. By Hasse’s theorem (Theorem 2.48, p 106), with . A knowledge of t modulo sufficiently many small primes l allows us to reconstruct t using the Chinese remainder theorem. Because of the Hasse bound on t, it is sufficient to choose l from the primes q1, q2, . . . in succession, until the product q1q2 · · · qr exceeds . By the prime number theorem (Theorem 2.20, p 53), we have r = O(ln p) and also qi = O(ln p) for each i = 1, . . . , r.

The most innovative idea of Algorithm 3.31 is the determination of the integers ti. For l = q1 = 2, the process is easy. We have t1t ≡ 0 (mod 2) if and only if contains a point of order 2 (a point of the form (h, 0)), or equivalently, if and only if the polynomial X3 + aX + b has a root in . We compute the polynomial gcd g(X) := gcd(X3 + aX + b, XpX) over and conclude that

Algorithm 3.31. SEA algorithm for elliptic curve point counting

Input: A prime field , p odd, and an elliptic curve E defined over .

Output: The order of the group .

Steps:

Find (the smallest)  such that the product .
for i = 1, 2, . . . , r { Compute  with t ≡ ti (mod qi). }
Compute t by combining t1t2, . . . , tr using the Chinese Remainder Theorem.

Determination of ti for i > 1 involves more work. We explain here the original idea due to Schoof. We denote by l the i-th prime qi and by the set of all l-torsion points of (Definition 2.78, p 105). The Frobenius endomorphism that maps and to (hp, kp) satisfies the relation . If we restrict our attention only to the group E[l], then this relation reduces to , where ti = t rem l and pi = p rem l, that is, for all .

In terms of polynomials, the last relation is equivalent to

Equation 3.4


where the sum and difference follow the formulas for the elliptic curve E. Now, one has to calculate symbolically rather than numerically, since X and Y are indeterminates. These computations can be carried out in the ring (instead of in ), where f(X, Y) = Y2 – (X3 + aX + b) is the defining polynomial of E and fl = fl(X) is the l-th division polynomial of E (Section 2.11.2 and Theorem 2.47, p 106). Reduction of a polynomial in modulo f makes its Y-degree ≤ 1, whereas reduction modulo fl makes the X-degree less than deg fl which is O(l2). We can try the values ti = 0, 1, . . . , l – 1 successively until the desired value satisfying Equation (3.4) is found out.

It is not difficult to verify that Schoof’s algorithm runs in time O(log8 p) (under standard arithmetic in ) and is thus a deterministic polynomial-time algorithm for the point-counting problem. Essentially the same algorithm works for fields with q = 2n and has the same running time. Unfortunately, the big exponent (8) in the running time makes Schoof’s algorithm quite impractical. Numerous improvements are suggested to bring down this exponent. Elkies and Atkin’s modification for the case q = p gives rise to the SEA algorithm which has a running time of O(log6 p) under the standard arithmetic in . This speed-up is achieved by working in the ring , where gl is a suitable factor of fl and has degree O(l). Couveignes suggests improvements for the fields of characteristic 2. Efficient implementations of the SEA algorithm are reported by Morain, Müller, Dewaghe, Vercauteren and many others. At the time of writing this book, the largest values of q for which the algorithm has been successfully applied are 10499 + 153 (a prime) and 21999 (a power of 2).

The Satoh–FGH algorithm

The Satoh–FGH algorithm is well suited for fields of small characteristic p and, in particular, for the fields of characteristic 2. This algorithm has enabled point-counting over fields as large as . A generic description of the Satoh–FGH algorithm now follows after the introduction of some mathematical notions. Though our practical interest concentrates on the fields only, we consider curves over a general with q = pn, p a prime.

Recall from Section 2.14 that the ring of p-adic integers is a discrete valuation ring (Exercises 2.133 and 2.148) with the unique maximal ideal generated by , and the residue field is isomorphic to .

We represent as a polynomial algebra over . We analogously define the p-adic ring , where f is an irreducible polynomial of degree n in . The elements of can be viewed as polynomials of degrees < n and with p-adic integers as coefficients. The arithmetic operations in are polynomial operations in modulo the defining polynomial f. The ring is canonically embedded in the ring (consider constant polynomials).

turns out to be a discrete valuation ring with maximal ideal , and the residue field is isomorphic to .

Definition 3.6.

The projection map is defined as the map that takes a p-adic integer α = (a1, a2, . . .) to , and can be canonically extended to a map by π(α0 + α1X + · · · + αdXd) := π(α0) + π(α1)X + · · · + π(αd)Xd. In particular, this defines a projection map .

The (Teichmüller) lift is the map that takes 0 ↦ 0 and 0 ≠ a ↦ ω(a), where ω(a) is the unique (q – 1)-th root of unity in satisfying π(ω(a)) = a (cf. Exercise 2.160).

The semi-Witt decomposition of is defined to be the unique sequence a0, a1, . . . with such that α has the p-adic expansion .

The p-th power Frobenius endomorphism , aap, can now be extended to an endomorphism as follows. Let have the semi-Witt decomposition a0, a1, . . . with . Then, is the unique element having the semi-Witt decomposition One can show that . We have and similarly .

Now, let E = E0 be an elliptic curve defined over . Application of to the coefficients of E0 gives another elliptic curve E1 over whose rational points are , , where , together with the point at infinity. We may apply to E1 to get another curve E2 over and so on. Since , we get a cycle of elliptic curves defined over :

Equation 3.5


Similarly, if ε = ε0 is an elliptic curve defined over , application of leads to a sequence of elliptic curves defined over :

Equation 3.6


We need the canonical lifting of an elliptic curve E over to a curve ε over . Explaining that requires some more mathematical concepts:

Definition 3.7.

Let K be a field and let E and E′ be two elliptic curves defined over K. A morphism (Definition 2.72, p 95) that maps the point at infinity of E to the point at infinity of E′ is called an isogeny. The zero isogeny EE′ maps every point to . A non-zero isogeny is also called a non-constant isogeny. Two curves E and E′ are called isogenous, if there exists a non-constant isogeny EE′.

The kernel ker of an isogeny is defined to be the set . For every non-constant isogeny , the kernel ker is a finite subgroup of E(K).

The set Hom(E, E′) of all isogenies EE′ is an Abelian group defined as , , , . If E = E′, then End(E) := Hom(E, E) becomes a ring with multiplication defined by composition and is called the endomorphism ring of E.

The multiplication-by-m map of E is an isogeny. If End(E) contains an isogeny not of this type, we call E an elliptic curve with complex multiplication.

Theorem 3.4.

For each , there exists a unique polynomial symmetric and of degree i + 1 in each of X and Y, such that two curves E and E′ (defined over a field K) with j-invariants j and j′ satisfy Φi(j, j′) = 0 if and only if there is an isogeny EE′ whose kernel is cyclic of order i.

Definition 3.8.

The polynomials , , of Theorem 3.4 are called modular polynomials. As an example,

Φ2(X, Y)=X3 + Y3X2Y2 + 1488(X2Y + XY2) –
  162,000(X2 + Y2) + 40,773,375XY + 8,748,000,000(X + Y) –
  157,464,000,000,000.

The next theorem establishes the foundation for lifting curves from to .

Theorem 3.5. Lubin–Serre–Tate

Let E be an elliptic curve defined over , q = pn, , and with j-invariant . There exists an elliptic curve ε defined over with a unique j-invariant such that and . The curve ε is called the canonical lift of E and is unique upto isomorphism.

With this definition of lifting of elliptic curves, Cycles (3.5) and (3.6) satisfy the following commutative diagram, where εi is the canonical lift of Ei for each i = 0, 1, . . . , n.

Algorithm 3.32 outlines the Satoh–FGH algorithm. In order to complete the description of the algorithm, one should specify how to lift curves (that is, a procedural equivalent of Theorem 3.5) and their p-torsion points and how the lifted data can be used to compute the Frobenius trace t. We leave out the details here.

Algorithm 3.32. Satoh–FGH algorithm for elliptic curve point counting

Input: An elliptic curve E over , q = pn, p prime, with j-invariant .

Output: The cardinality or equivalently the trace .

Steps:

Compute the curves E0, . . . , En–1 and their j-invariants j0, . . . , jn–1.
Compute the lifted j-invariants J0, . . . , Jn–1.
Compute the lifted curves ε0, . . . , εn–1.
Lift the p-torsion groups Ei[pfor i = 0, . . . , n – 1.
Compute t and hence  from the lifted data.

The elements of (and hence of ) are infinite sequences and hence cannot be represented in computer memory. However, we make an approximate representation by considering only the first m terms of the sequences representing elements of . Working in with this approximate representation is then essentially the same as working in . For the Satoh–FGH algorithm, we need mn/2.

For small p (for example, p = 2) and with standard arithmetic in , the Satoh–FGH algorithm has a deterministic running time O(n5) and space requirement O(n3). With Karatsuba arithmetic the exponent in the running time drops from 5 to nearly 4.17. In addition, this algorithm is significantly easier to implement than optimized versions of the SEA algorithm. These facts are responsible for a superior performance of the Satoh–FGH algorithm over the SEA algorithm (for small p).

3.6.3. Choosing Good Elliptic Curves

Choosing cryptographically suitable elliptic curves is more difficult than choosing good finite fields. First, the order of the elliptic curve group must have a suitably large prime divisor, say, of bit length 160 or more. In addition, the MOV attack applies to supersingular curves and the anomalous attack to anomalous curves (Definition 2.80 and Section 4.5). So a secure curve must be non-supersingular and non-anomalous. Checking all these criteria for a random curve E over requires the group order . One may use either the SEA algorithm or the Satoh–FGH algorithm to compute . Once is known, it is easy to check whether E is supersingular or anomalous. But factoring to find its largest prime divisor may be a difficult task and is not recommended. One may instead extract all the small prime factors of by trial divisions with the primes q1 = 2, q2 = 3, q3 = 5, . . . , qr for a predetermined r and write where m1 has all prime factors ≤ qr and m2 has all prime factors > qr. If m2 is prime and of the desired size, then E is treated as a good curve. Algorithm 3.33 illustrates these steps.

The computation of the group orders takes up most of the execution time of the above algorithm. It is, therefore, of utmost importance to employ good algorithms for point counting. The best algorithms known till date (the SEA and the Satoh–FGH algorithms) are only reasonable. Further research in this area may lead to better algorithms in future.

Algorithm 3.33. Selecting cryptographically suitable elliptic curves

Input: A suitably large finite field .

Output: A cryptographically good elliptic curve E over .

Steps:

while (1) {
   Generate a random elliptic curve E over .
   Determine .
   if (E is neither supersingular nor anomalous) {
      Try to factorize  using trial division by small primes.
      if ( has a suitably large prime divisor) { Return E }
   }
}

There are ways of generating good curves without requiring the point counting algorithms over large finite fields. One possibility is to use the so-called subfield curves. If has a subfield of relatively small cardinality, one can choose a random curve E over and compute . Since E is also a curve defined over and can be easily obtained using Theorem 2.51 (p 107), we save the lengthy direct computation of . However, the drawback of this method is that since E is now chosen with coefficients from a small field , we do not have many choices. The second drawback is that we must have a small divisor q′ of q. If q is already a prime, this strategy does not work at all. If q = pn, p a small prime, we need n to have a small divisor n′ that corresponds to q′ = pn. Sometimes small odd primes p are suggested, but the arithmetic in a non-prime field of some odd characteristic is inherently much slower than that in a field of nearly equal size but of characteristic 2.

Specific curves with complex multiplication (Definition 3.7) over large prime fields have also been suggested in the literature. Finding good curves with complex multiplication involves less computational overhead than Algorithm 3.33, but (like subfield curves) offers limited choice. However, it is important to mention that no special attacks are currently known for subfield curves and also for those chosen by the complex multiplication strategy.

**3.7. Arithmetic on Hyperelliptic Curves

Let be a finite field and C a hyperelliptic curve of genus g defined over K by Equation (2.13), that is, by

C : Y2 + u(X)Y = v(X)

for suitable polynomials u, . We want to implement the arithmetic in the Jacobian . Recall from Section 2.12 that an element of can be represented uniquely as a reduced divisor Div(a, b) for a pair of polynomials a(x), with a monic, degx ag, degx b < degx a and a|(b2 + buv). Thus, each element of requires O(g log q) storage.

3.7.1. Arithmetic in the Jacobian

We first present Algorithm 3.34 that, given two elements Div(a1, b1), Div(a2, b2) of , computes the reduced divisor Div which satisfies Div(a, b) ~ Div(a1, b1) + Div(a2, b2). The algorithm proceeds in two steps:

  1. Compute a semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2).

  2. Compute the reduced divisor Div(a, b) ~ Div(a′, b′).

Both these steps can be performed in (deterministic) polynomial time (in the input size, that is, g log q). Algorithm 3.34 implements the first step and continues to work even when the input divisors are semi-reduced (and not completely reduced).

Algorithm 3.34. Sum of semi-reduced divisors

Input: (Semi-)reduced divisors Div(a1, b1) and Div(a2, b2) defined over K.

Output: A semi-reduced divisor Div(a′, b′) ~ Div(a1, b1) + Div(a2, b2).

Steps:

d1 := gcd(a1, a2) = u1a1 + u2a2./* Extended gcd in K[X] */
d2 := gcd(d1, b1 + b2 + u) = v1d1 + v2(b1 + b2 + u)./* Extended gcd in K[X] */

It is an easy check that the two expressions appearing between pairs of big parentheses in Algorithm 3.34 are polynomials. This algorithm does only a few gcd calculations and some elementary arithmetic operations on polynomials of K[X]. If the input polynomials (a1, a2, b1, b2) correspond to reduced divisors, then their degrees are ≤ g and hence this algorithm runs in polynomial time in the input size. Furthermore, in that case, the output polynomials a′ and b′ are of degrees ≤ 2g.

We now want to compute the unique reduced divisor Div(a, b) equivalent to the semi-reduced divisor Div(a′, b′). This can be performed using Algorithm 3.35. If the degrees of the input polynomials a′ and b′ are O(g) (as is the case with those output by Algorithm 3.34), Algorithm 3.35 takes a time polynomial in g log q. To sum up, two elements of can be added in polynomial time. The correctness of the two algorithms is not difficult to establish, but the proof is long and involved and hence omitted. Interested readers might look at the appendix of Koblitz’s book [154].

For an element and , one can easily write an algorithm (similar to Algorithm 3.9) to compute nα using O(log n) additions and doublings in .

3.7.2. Counting Points in Jacobians of Hyperelliptic Curves

For a hyperelliptic curve C of genus g defined over a field , we are interested in the order of the Jacobian rather than in the cardinality of the curve . Algorithmic and implementational studies of counting have not received enough research endeavour till date and though polynomial-time algorithms are known to this effect (at least for curves of small genus), these algorithms are far from practical for hyperelliptic curves of cryptographic sizes. In this section, we look at some of these algorithms.

Algorithm 3.35. Reduction of a semi-reduced divisor

Input: A semi-reduced divisor Div(a′, b′) defined over K.

Output: The reduced divisor Div(a, b) ~ Div(a′, b′).

Steps:

(ab) := (a′, b′).
while (deg a > g) {
  .  /* a′ is a polynomial */
  b′ := –(u + b) rem a′.
  (ab) := (a′, b′).
}
a := [lc(a)]–1a.   /* Make a monic */

We start with some theoretical results which are generalizations of those for elliptic curves. The Frobenius endomorphism , xxq, is a (non-trivial) -automorphism of . The map naturally (that is, coordinate-wise) extends to the points on and also to divisors and, in particular, to the Jacobian as well as to . For a reduced divisor Div, we have , where for a polynomial the polynomial is obtained by applying the map to the coefficients of h. It is known that satisfies a monic polynomial χ(X) of degree 2g with integer coefficients. For example, for g = 1 (elliptic curves) we have

χ(X) = X2tX + q,

where t is the trace of Frobenius at q. For g = 2, we have

Equation 3.7


for integers t1, t2. The cardinality is related to the polynomial χ(X) as

and satisfies the inequalities

Equation 3.8


Thus n lies in a rather narrow interval, called the Hasse–Weil interval, of width ,

Theorem 2.50 can be generalized as follows:

Theorem 3.6. Structure theorem for

The Jacobian is the direct sum of at most 2g cyclic groups, that is, with r ≤ 2g, n1, . . . , nr ≥ 2 and ni + 1|ni for each i = 1, 2, . . . , r – 1.

The exponent of (See Exercise 3.42) is clearly m := Exp . Since m|n, there are ≤ ⌈(w + 1)/m⌉ possibilities for n for a given m (where w is the width of the Hasse–Weil interval). In particular, n is uniquely determined by m, if m > w. From the Hasse–Weil bound, we have , that is, . There are examples with . On the other hand, . So it is possible to have mw, though such curves are relatively rare. In the more frequent case (m > w), Algorithm 3.36 determines n.

Algorithm 3.36. Hyperelliptic curve point counting

Input: A hyperelliptic curve C of genus g defined over .

Output: The cardinality n of the Jacobian .

Steps:

m := 1.
while (m ≤ w) {
   Choose a random element .
   Determine ν := ord x.
   m := lcm(m, ν).
}
n := the unique multiple of m in the Hasse–Weil interval.

Since Exp , the above algorithm eventually (in practice, after few executions of the while loop) computes this exponent. However, if Exp , the algorithm never terminates. Thus, we may forcibly terminate the algorithm by reporting failure, after sufficiently many random elements x are tried (and we continue to have mw). In order to complete the description of the algorithm, we must specify a strategy to compute ν := ord x for a randomly chosen . Instead of computing ν directly, we compute an (integral) multiple μ of ν, factorize μ and then determine ν. Since nx = 0, we search for a desired multiple μ in the Hasse–Weil interval. This search can be carried out using a baby-step–giant-step (Section 4.4) or a birthday-paradox (Exercise 2.172) method, and the algorithm achieves an expected running-time of which is exponential in the input size. This method, therefore, cannot be used except when n is small.

For hyperelliptic curves of small genus g, generalizations of Schoof’s algorithm (Algorithm 3.31) can be used. Gaudry and Harley [106] describe the case g = 2. One computes the polynomial χ(X) of Equation (3.7), that is, the values of t1 and t2 modulo sufficiently many small primes l. Since the roots of χ(X) are of absolute value , we have and |t2| ≤ 6q. Therefore, determination of t1 and t2 modulo O(log q) small primes l uniquely determines χ(X) (as well as n = χ(1)).

Let be the set of l-torsion points of . The Frobenius map restricted to satisfies

Equation 3.9


where t1, l := t1 rem l, t2, l := t2 rem l and ql := q rem l. By exhaustively trying all (that is, ≤ l2) possibilities for t1,l and t2,l, one can find out their actual values, that is, those values that cause the left side of Equation (3.9) to vanish (symbolically).

A result by Kampkötter [144] allows us to consider only the reduced divisors of the form D = Div(a, b) with a(X) = X2 + a1X + a0 and b(X) = b1X + b0. There exists an ideal of the polynomial ring such that a reduced divisor D of this special form lies in if and only if f(a1, a0, b1, b0) = 0 for all . Thus the computation of the left side of Equation (3.9) may be carried out in the ring . An explicit set of generators for can be found in Kampkötter [144]. To sum up, we get a polynomial-time algorithm.

Working (modulo ) in the 4-variate polynomial ring is, indeed, expensive. Use of Cantor’s division polynomials [43] essentially reduces the arithmetic to proceed with a single variable (instead of four). We do not explore further along this line, but only mention that for g = 2 Schoof’s algorithm employing division polynomials runs in time O(log9 q). Although this is a theoretical breakthrough, the prohibitively large exponent (9) in the running-time precludes the feasibility of using the algorithm in the range of interest in cryptography.

Exercise Set 3.7

3.42Let G be a multiplicative group (not necessarily Abelian and/or finite) with identity e.

Let .

  1. Show that S is a subgroup of .

  2. Show that every subgroup of is generated by a single element. In particular, S = 〈m〉 for some integer m. Without loss of generality, we can take m ≥ 0. This m is called the exponent of the group G and is denoted by Exp G.

  3. If G is finite, show that Exp G| ord G.

  4. If G is finite and Abelian, show that Exp . Deduce that in this case there exists such that ord x = Exp G.

3.8. Random Numbers

So far we have met several situations where we needed random elements from a (finite) set, for example, the set (or ) or the set (or ) or the set of -rational points on an elliptic (or hyperelliptic) curve. By randomness, we here mean that each element is equally likely to get selected, that is, if #S = n, then each element of S is selected with probability 1/n. Since elements of a set S of cardinality n can be represented as bit strings of length ≤ ⌈lg(n + 1)⌉, the problem of selecting a random element of S essentially reduces to the problem of generating (finite) random sequences of bits. A random sequence of bits is one in which every bit has a probability of 1/2 of being either 0 or 1 (irrespective of the other bits in the sequence).

3.8.1. Pseudorandom Bit Generators

Generating a (truly) random sequence of bits seems to be an impossible task. Some natural phenomena, such as electronic noise from a specifically designed integrated circuit, can be used to generate random bit sequences. However, such systems are prone to malfunctioning, often influenced by observations and are, of course, costly. A software solution is definitely the more practical alternative. Phenomena, like the system clock, the work load or memory usage of a machine, that can be captured by programs may be used to generate random bit sequences. But this strategy also suffers from various drawbacks. First of all the sequences generated by these methods would not be (truly) random. Moreover they are vulnerable to attacks by adversaries (for example, if a random bit generator is based on the system clock and if the adversary knows the approximate time when a bit sequence is generated using that generator, she will have to try only a few possibilities to generate the same sequence).

In order to obviate these difficulties, pseudorandom bit generators (PRBG) are commonly used. A bit string a0a1a2 . . . is generated by a PRBG following a specific strategy, which is more often that not a (mathematical) algorithm. The first bit a0 is based on certain initial value, called a seed, whereas for i ≥ 1the bit ai is generated as a predetermined function of some or all of the previous bits a0, . . . , ai–1. Since the resulting bit ai is now functionally dependent on the previous bits, the sequence is not at all random (but deterministic), but we are happy if the sequence a0a1a2 . . . looks or behaves random. A random behaviour of a sequence is often examined by certain well-known statistical tests. If a generator generates bit sequences passed by these tests, we call it a PRBG and sequences available from such a generator pseudorandom bit sequences. Various kinds of PRBGs are used for generating pseudorandom bit sequences. We won’t describe them here, but concentrate on a particular kind of generators that has a special significance in cryptography.

3.8.2. Cryptographically Strong Pseudorandom Bit Generators

A PRBG for which no polynomial time algorithms exist (provably or not) in order to predict with probability significantly larger than 1/2 a bit in a sequence generated by the PRBG from a knowledge of the previous bits (but without the knowledge of the seed) is called a cryptographically strong (or secure) pseudorandom bit generator or a CSPRBG in short. Usually, an intractable computational problem (see Section 4.2) is at the heart of the security of a CSPRBG. As an example, we now explain the Blum–Blum–Shub (or BBS) generator.

Algorithm 3.37. Blum–Blum–Shub pseudorandom bit generator

Input: .

Output: A cryptographically strong pseudorandom bit sequence a0a1a2 . . . .

Steps:

Generate two (distinct) large primes p and q each ≡ 3 (mod 4).
n := pq.
Generate a (random) seed .
x0 := s2 (mod n).
for i = 0, . . . , m {
   ai := the least significant bit of xi.
   .
}

In Algorithm 3.37, we have used indices for the sequence xi for the sake of clarity. In an actual implementation, all indices may be removed, that is, one may use a single variable x to store and update the sequence xi. Furthermore, if there is no harm in altering the value of s, one might even use the same variable for s and x.

The cryptographic security of the BBS generator stems from the presumed intractability of factoring integers or of computing square roots modulo a composite integer (here n = pq) (see Exercise 3.43). Note that p, q and s have to be kept secret, whereas n can be made public. A knowledge of xm + 1 is also not expected to help an opponent and may too be made public. For achieving the desired level of secrecy, p and q should be of nearly equal size and the size of n should be sufficiently large (say, 768 bits or more). Generating each bit by the BBS generator involves a modular squaring and is, therefore, somewhat slow (compared to the traditional PRBGs which do not guarantee cryptographic security). However, the BBS generator can be used for moderately infrequent purposes, for example, for the generation of a session key. Moreover, a maximum of lg lg n (least significant) bits (instead of 1 as in the above snippet) can be extracted from each xi without degrading the security of the generator.

It is evident that any (infinite) sequence a0a1 · · · generated by the BBS generator must be periodic. As an extreme example, if s = 1, then the BBS generator outputs a sequence of one-bits only. We are interested in rather short (sub)sequences (of such infinite sequences). Therefore, it suffices if the length of the period is reasonably large (for a random seed s). This is guaranteed if one uses strong primes (Definition 3.5)

3.8.3. Seeding Pseudorandom Bit Generators

The way we have defined PRBG (or CSPRBG) makes it evident that the unpredictability of a pseudorandom bit sequence essentially reduces to that of the seed. Care should, therefore, be taken in order to choose the values of the seed. The seed need not be randomly or pseudorandomly generated, but should have a high degree of unpredictability, so that it is infeasible for an adversary to have a reasonably quick guess of it. As an example, assume that we intend to generate a suitable seed s for the BBS generator with a 1024-bit modulus n. If we employ for that purpose a specific algorithm (known to the opponent) using only the built-in random number generator of a standard compiler and if this built-in generator has a 32-bit seed σ, then there are only 232 possibilities for s, even when s itself is 1024 bits long. Thus an adversary has to try at most 232 (231 on an average) values of σ in order to guess the correct value of s. So we must add further unpredictability to the resulting seed value s. This can be done by setting the bits of s depending on several factors, like the system clock, the system load, the memory usage, keyboard inputs from a human user and so on. Each of such factors might not be individually completely unpredictable, but their combined effect should preclude the feasibility of an exhaustive search by the opponent. After all, we have 1024 bits of s to fill up and even if the total search space of possible values of s is as low as 2160, it would be impossible for the opponent to guess s in a reasonable span of time. Note that more often than not the values of the seed need not be remembered: that is, need not be regenerated afterwards. As a result, there is no harm in introducing unpredictability in s caused by certain factors that we would not ourselves be able to reproduce in future.

Exercise Set 3.8

3.43With the notations of Algorithm 3.37 show that:
  1. Every quadratic residue has four distinct square roots modulo n, of which exactly one, say y, is a quadratic residue modulo n. [H]

  2. The square root y of x can be obtained by solving the simultaneous congruences yx(p + 1)/4 (mod p) and yx(q + 1)/4 (mod q).

  3. The bit sequence a0a1 . . . am is uniquely determined by (n and) xm + 1.

  4. One can compute in polynomial (in log n and m) time the bit sequence a0a1 . . . am from the knowledge of n and xm + 1, if either

    1. the primes p and q are known, or

    2. one can check in polynomial (in log n) time if an arbitrary element is a quadratic residue modulo n and if so, compute in polynomial time the square roots of y modulo n.

Chapter Summary

This chapter deals with the algorithmic details needed for setting up public-key cryptosystems. We study algorithms for selecting public-key parameters and for carrying out the basic cryptographic primitives. Algorithms required for cryptanalysis are dealt with in Chapters 4 and 7.

We start the chapter with a discussion on algorithms. Time and space complexities of algorithms are discussed first and the standard order notations are explained. Next we study the class of randomized algorithms which provide practical solutions to many computational problems that do not have known efficient deterministic algorithms. In the worst case, a randomized algorithm may take exponential running time and/or may output an incorrect answer. However, the probability of these bad behaviours of a randomized algorithm can be made arbitrarily low. We finally discuss reduction between computational problems. A reduction helps us conclude about the complexity of one problem relative to that for another problem.

Many popular public-key cryptosystems are based on working modulo big integers. These integers have sizes up to several thousand bits. One can not represent such integers with full precision by built-in data types supplied by common programming languages. So we require efficient ways of representing and doing arithmetic on big integers. We carefully deal with the implementation of the arithmetic on multiple-precision integers. We provide a special treatment of computation of gcd’s and extended gcd’s of integers. We utilize these arithmetic functions in order to implement modular arithmetic. Most public-key primitives involve modular exponentiations as the most time-consuming steps. In addition to the standard square-and-multiply algorithm, certain special tricks (including Montgomery exponentiation) that help speed up modular exponentiation are described at length in this section.

In the next section, we deal with some other number-theoretic algorithms. One important topic is the determination of whether a given integer is prime. The Miller–Rabin primality test is an efficient algorithm for primality testing. This algorithm is, however, randomized in the sense that it may declare some composite integers as primes. Using suitable choices of the relevant parameters, the probability of this error may be reduced to very low values (≤ 2–80). We also briefly introduce the deterministic polynomial-time AKS algorithm for primality testing. Since we can easily check for the primality of integers, we can generate random primes by essentially searching in a pool of randomly generated odd integers of a given size. Security in some cryptosystems require such random primes to possess some special properties. We present Gordon’s algorithm for generating cryptographically strong primes. The section ends with a study of the Tonelli–Shanks algorithm for computing square roots modulo a big prime.

Next, we concentrate on the implementation of the finite field arithmetic. The arithmetic of a field of prime cardinality p is the same as integer arithmetic modulo p and is discussed in detail earlier. The other finite fields that are of interest to cryptology are extension fields of characteristic 2. In order to study the arithmetic in these fields, one first requires arithmetic of the polynomial ring . We discuss the basic operations in this ring. Next we talk about algorithms for checking irreducibility of polynomials and for obtaining (random) irreducible polynomials in . If f(X) is such a polynomial of degree d, the arithmetic of the field is the same as the arithmetic of modulo the defining polynomial f(X). In order that a finite field is cryptographically safe, we require q – 1 to have a prime factor of sufficiently big size (160 bits or more). Suppose that the factorization of q – 1 is provided. We discuss algorithms that compute the order of elements in , that check if a given element is a generator of the cyclic group , and that produce random generators of . We end the study of finite fields by discussing a way to factor polynomials over finite fields. The standard algorithm comprising the three steps square-free factorization, distinct-degree factorization and equal-degree factorization is explained in detail. The exercises cover the details of an algorithm to compute the roots of polynomials over finite fields.

The arithmetic of elliptic curves over finite fields is dealt with next. Each operation in the elliptic curve group can be realized by a sequence of operations over the underlying field. The multiple of a point on an elliptic curve can be computed by a repeated double-and-add algorithm which is the same as the square-and-multiply algorithm for modular exponentiation, applied to an additive setting. We also discuss ways of selecting random points on elliptic curves. We then present two algorithms for counting points in an elliptic curve group. The SEA algorithm is suitable for curves over prime fields, whereas the Satoh–FGH algorithm works efficiently for curves over fields of characteristic 2. Once we can determine the order of an elliptic curve group, we can choose good elliptic curves for cryptographic usage.

In the next section, we study the arithmetic of hyperelliptic curves. We describe ways to represent elements of the Jacobian by pairs of polynomials and to do arithmetic on elements in this representation. We also discuss two algorithms for counting points in a Jacobian.

In the last section, we address the issue of generation of pseudorandom bits. We define the concept of cryptographically strong pseudorandom bit generator and provide an example, namely the Blum–Blum–Shub generator, which is cryptographically strong under the assumption that taking square roots modulo a big composite integer is computationally intractable.

Suggestions for Further Reading

The basic algorithmic issues discussed in Section 3.2 can be found in any text-book on data structure and algorithms. One can, for example, look at [7, 8, 61]. However, most of these elementary books do not talk about randomization and parallelization issues. We refer to [214] for a recent treatise on randomized algorithm. Also see Rabin’s papers [247, 248].

Complexity theory deals with classifying computational problems based on the known algorithms for solving them and on reduction of one problem to another. A simple introduction to complexity theory is the book [280] by Sipser. Chapter 2 of Koblitz’s book [154] is also a compact introduction to computational complexity meant for cryptographers. Also see [113].

Knuth’s book [147] is seemingly the best resource to look at for a comprehensive treatment on multiple-precision integer arithmetic. The proofs of correctness of many algorithms, that we omitted in Section 3.3, can be obtained in this book. This can be supplemented by the more advanced algorithms and important practical tips compiled in the book [56] by Cohen who designed a versatile computational number theory package known as PARI. Montgomery’s multiplication algorithm appeared in [210]. Also see Chapter 14 of Menezes et al. [194] for more algorithms and implementation issues.

Most of the important papers on primality testing [3, 4, 5, 116, 175, 204, 248, 287] have been referred in Section 3.4.1. Also see the survey [164] due to Lenstra and Lenstra. Gordon’s algorithm for generating strong primes appeared in [118]. The book [69] by Crandall and Pomerance is an interesting treatise on prime numbers, written with a computational perspective. The modular square-root Algorithm 3.16 is essentially due to Tonelli (1891). Algebraic number theory is treated from a computation perspective in Cohen [56] and Pohst and Zassenhaus [235].

Arithmetic on finite fields is discussed in many books including [179, 191]. Finite fields find recent applications in cryptography and coding theory and as such it is necessary to have efficient software and hardware implementations of finite field arithmetic. A huge number of papers have appeared in the last two decades, that talk about these implementation issues. Chapter 5 of Menezes [191] talks about optimal normal bases (Section 2.9.3 of the current book) which speeds up exponentiation in finite fields.

Factoring univariate polynomials over finite fields is a topic that has attracted a lot of research attention. Berlekamp’s Q-matrix method [21] is the first modern algorithm for this purpose. Computationally efficient versions of the algorithm discussed in Section 3.5.4 have been presented in Gathen and Shoup [104] and Kaltofen and Shoup [143]. The best-known running time for a deterministic algorithm for univariate factorization over finite fields is due to Shoup [272]. Shparlinski shows [274] that Shoup’s algorithm on a polynomial in of degree d uses O(q1/2(log q)d2+ε) bit operations. This is fully exponential in log q.

The book [103] by von zur Gathen and Gerhard is a detailed treatise on many topics discussed in Sections 3.2 to 3.5 of the current book. Mignotte’s book [203] and the one by [108] by Geddes et al. also have interesting coverage. Also see Chapter 1 of Das [72] for a survey of algorithms for various computational problems on finite fields.

For elliptic curve arithmetic, look at Blake et al. [24], Hankerson et al. [123] and Menezes [192]. The first polynomial-time algorithm for counting points in elliptic curves over a finite field has been proposed by Schoof. The original version of this algorithm runs in time O(log8 q). Later Elkies improved this running time to O(log6 q) for most of the elliptic curves. Further modifications due to Atkin gave rise to what we call the SEA algorithm. Schoof’s paper [264] talks about this point-counting algorithm and includes the modifications due to Elkies and Atkin. Also look at the article [85] by Elkies.

The Satoh–FGH algorithm is originally due to Satoh [256]. Fouquet et al. [94] have proposed a modification of Satoh’s algorithm to work for fields of characteristic 2. They also report large-scale implementations of the modified algorithm. Also see Fouquet et al. [95] and Skjernaa [281].

Recently, there has been lot of progress in point counting algorithms, in particular, for fields of characteristic 2. The most recent account of this can be found in Lercier and Lubicz [177]. The authors of this paper later reported implementation of their algorithm for counting points in an elliptic curve over . This computation took nearly 82 hours on a 731 MHz Alpha EV6 processor. With these new developments, the point counting problem is practically solved for fields of small characteristics. However, for prime fields the known algorithms require further enhancements in order to be useful on a wide scale.

Finding good random elliptic curves for cryptographic purposes has also been an area of active research recently. With the current status of solving the elliptic curve discrete-log problem, the strategy we mentioned in Algorithm 3.33 is quite acceptable as long as good point-counting algorithms are at our disposal (they are now). For further discussions on this topic, we refer the reader to two papers [95, 176].

The appendix in Koblitz’s book [154] is seemingly the best source for learning hyperelliptic curve arithmetic. This is also available as a CACR technical report [195]. Gaudry and Harley’s paper [106] has more on the hyperelliptic curve point-counting algorithms we discussed in Section 3.7.2. Hess et al. [126] discuss methods for computing hyperelliptic curves for cryptographic usage.

Chapter 5 of Menezes et al. [194] is devoted to the generation of pseudorandom bits and sequences. This chapter lists the statistical tests for checking the randomness of a bit sequence. It also describes two cryptographically secure pseudorandom bit generators other than the BBS generator (Algorithm 3.37). The BBS generator was originally proposed by Blum et al. [26]. Also see Chapter 3 of Knuth [147].

4. The Intractable Mathematical Problems

4.1Introduction
4.2The Problems at a Glance
4.3The Integer Factorization Problem
4.4The Finite Field Discrete Logarithm Problem
4.5The Elliptic Curve Discrete Logarithm Problem
4.6The Hyperelliptic Curve Discrete Logarithm Problem
4.7Solving Large Sparse Linear Systems over Finite Rings
4.8The Subset Sum Problem
 Chapter Summary
 Sugestions for Further Reading

It is insufficient to protect ourselves with laws; we need to protect ourselves with mathematics.

—Bruce Schneier

Most number theorists considered the small group of colleagues that occupied themselves with these problems as being inflicted with an incurable but harmless obsession.

—Arjen K. Lenstra and Hendrik W. Lenstra, Jr. [164]

All mathematics is divided into three parts: cryptography (paid for by CIA, KGB and the like), hydrodynamics (supported by manufacturers of atomic submarines) and celestial mechanics (financed by military and other institutions dealing with missiles, such as NASA).

—V. I. Arnold [13]

4.1. Introduction

Public-key cryptographic systems are based on the apparent intractability of solving certain computational problems. However, there is very little evidence (if any) to corroborate the fact that algorithmic solutions to these problems are really very difficult. In spite of intensive studies over a long period, mathematicians and cryptologists have not come up with good algorithms, and it is their failures that justify the attempts to go on building secure cryptographic protocols based on these problems. The inherent assumption is that it would be infeasible for an opponent having practical amounts of computing resources to break these cryptosystems in a reasonable amount of time. Of course, the fear remains that someone may devise a fast algorithm and our cryptosystems may not pass the security guarantees. On the other extreme, it is also possible that someone proves the theoretical (and, hence, practical) impossibility of solving such a problem in a small (like polynomial) amount of time, and our cryptosystems become secure for ever (well, at least until other paradigms of computing, like the yet practically non-implementable quantum computing, solve the problems efficiently).

Whether you are a cryptographer or a cryptanalyst, it is important, if not essential, to be aware of the best methods available till date to attack the intractable problems of cryptography. In the first place, this knowledge quantifies practical security margins of the protocols, for instance, by dictating the determination of the input sizes as a function of the security requirements. Let us take a specific example: With today’s computing power and known integer factorization algorithms, we assert that a message that needs to be kept secret for a day or two may be encrypted by a 768-bit RSA key, whereas if one wants to maintain the security for a year or more, much longer keys are needed. The second point in studying the known cryptanalytic algorithms is that though general-purpose algorithms for solving these problems are still unknown, there are good algorithms for specific cases—the cases to be avoided by the designers of cryptographic applications. For example, there is a linear-time algorithm to attack cryptographic systems based on anomalous elliptic curves. The moral is that one must not employ these curves for cryptographic applications. The third reason for studying cryptanalytic algorithms is sentimental. The fact that we are still unable to answer some simply stated questions even after spending a reasonable amount of collective effort is indeed humbling. To worsen matters, cryptography thrives by exploiting this scientific inadequacy. Cryptanalysis, though seemingly unlawful from a cryptographer’s viewpoint, turns out to be a deep and beautiful area of applied mathematics. Ironically enough, it is quite common that the proponents of cryptographic protocols are themselves most interested to see the end. The journey goes on. . . Read on!

It may appear somewhat unusual to discuss the cryptanalytic algorithms prior to the cryptographic ones (see Chapter 5). We find this order convenient in that one must first know the intractable problems before applying them in cryptographic protocols. Moreover, the known attacks help one fix the parameters for use in the cryptographic algorithms. We defer till Chapter 7 other cryptanalytic techniques which do not directly involve solving these mathematical problems. The full power of the mathematical machinery of Chapters 2 and 3 is felt here in the science of cryptology. Understanding the various aspects of cryptology hence becomes easier.

4.2. The Problems at a Glance

Let us first introduce the intractable problems of cryptology. In the rest of this chapter, we describe some known methods to solve these problems.

The integer factorization problem (IFP) is perhaps the most studied one in the lot. We know that is a unique factorization domain (UFD) (Definition 2.25, p 40), that is, given a natural number n there are (pairwise distinct) primes (unique up to rearrangement) such that for some . Broadly speaking, the IFP is the determination of these pi and αi from the knowledge of n. Note that once the prime divisors pi of n are known, it is rather easy to compute the multiplicities αi = vpi(n) by trial divisions. It is, therefore, sufficient to find out the primes pi only. It is easy (Algorithm 3.13) to check if n is composite. If n is already prime, then its prime factorization is known. On the other hand, if n is known to be composite, an algorithm that splits n into two non-trivial factors, that is, that outputs n1, with n = n1n2, n1 < n and n2 < n, can be repeatedly used to compute the complete factorization of n. It is enough that a non-trivial factor n1 of n is made available, the cofactor n2 = n/n1 is obtained by a single division. Finally, it is sometimes known a priori that n is the product of two (distinct odd) primes (as in the RSA protocols). In this case, the non-trivial split of n immediately gives the desired factorization of n. To sum up, the IFP can be stated in various versions, the presumed difficulty of all these versions being essentially the same.

Problem 4.1

General integer factorization problem Given an integer , determine all the prime divisors of n.

Problem 4.2

Integer factorization problem (IFP) Given a composite integer , find a non-trivial divisor n1 of n (that is, a divisor n1 of n in the range 1 < n1 < n).

Problem 4.3

RSA integer factorization problem Given a product n = pq of two (distinct odd) primes p and q, find the prime divisors p and q of n.

Recall that if is the prime factorization of n, then the Euler totient function φ(n) of n is . Thus, if the prime factorization of n is known, it is easy to compute φ(n). The converse is not known to be true in general. However, if n = pq is the product of two primes, factoring n is polynomial-time equivalent to computing φ(n) (Exercise 3.6).
Problem 4.4

Totient problem Given a natural number , compute φ(n).

Problem 4.5

RSA totient problem Given a product n = pq of two (distinct odd) primes p and q, compute φ(n).

Note that is also a UFD. Quite interestingly, it is computationally easy to find a non-trivial factor g of a polynomial (that is, 0 < deg g < deg f). One might, for example, use the polynomial-time deterministic L3 algorithm named after Lenstra, Lenstra and Lovasz (Section 4.8.2).

Square roots modulo an integer can be computed in probabilistic polynomial time, if n is a prime (Algorithm 3.16). If n is composite, the situation is different. If the factorization of n is known, then the square roots can be computed modulo each prime divisor of n, lifted modulo the appropriate powers of the prime divisors and subsequently combined using the Chinese remainder theorem. On the other hand, if the factorization of n is not known, then computing square roots modulo n turns out to be a very difficult task. Recall that the Blum–Blum–Shub algorithm (Algorithm 3.37) exploits this fact to design a cryptographically secure random number generator.

Problem 4.6

Modular square root problem (SQRTP) Given a composite integer and an integer a, compute an integer x, if one exists, such that x2a (mod n).

Let us now look at another class of problems of an apparently distinct flavour. Let G be a finite cyclic group of order n := #G and let g be a generator of G. For a moment, let us assume that G is multiplicatively written. Any element can be written as a = gx for some integer x unique modulo n. In this case, x is called the discrete logarithm or the index of a with respect to the base g and is denoted by indg a.
Problem 4.7

Discrete logarithm problem (DLP) Given a finite cyclic group G, a generator g of G and an element , compute indg a.

If we now remove the restrictions that G is cyclic and/or that g is a generator of G (if G is cyclic), then we arrive at a generalized version of the DLP. Let us continue to assume that G is Abelian and finite. The subgroup H of G generated by is anyway cyclic. If , then the discrete logarithm or index of a with respect to the base g is an integer x unique modulo m := ord H such that a = gx. In this case, we denote such an integer x by indg a. On the other hand, if aH, then we say that the discrete logarithm indg a is not defined. Recall from Proposition 2.5 that if G is cyclic and if m is known, then checking if a belongs to H amounts to computing an exponentiation in G (that is, if and only if am is the identity of G). If G is not cyclic (or if m is not known), then it is not easy, in general, to develop such a nice criterion.
Problem 4.8

Generalized discrete logarithm problem (GDLP) Given a finite Abelian group G and elements g, , determine if a belongs to the subgroup of G generated by g, and if so, compute indg a.

Note that the DLP (or the GDLP) need not be an inherently difficult problem. Its difficulty depends on the choice of the group G and also on the representation of elements of G. For example, if G is the additive (cyclic) group and g is an integer with gcd(g, n) = 1, then for every integer a we have indg ag–1a (mod n), where the modular inverse g–1 (mod n) can be computed efficiently using the extended gcd algorithm (Algorithm 3.8) on g and n. Also note that if G is cyclic and if each element of G is represented as indg a for a given generator g of G (see, for example, Section 2.9.3), then computing discrete logarithms in G to the base g is a trivial problem. In that case, it is also trivial to compute discrete logarithms (if existent) to any other base h (Exercise 4.3).

On the other hand, there are certain groups G in which discrete logarithms cannot be computed so easily; that is, computing indices in G may demand time not bounded by any polynomial in log n, where n = ord G. However, if the group operation on any two elements of G can be performed in time bounded by a polynomial in log n, then cryptographic protocols can be based on G. Typical candidates for such groups are listed below together with the conventional names for the DLP over such groups.

Table 4.1. The discrete logarithm problem in various groups
GroupName for the DLP
The (cyclic) multiplicative group of a finite field The finite field discrete logarithm problem or simply the DLP by an abuse of notation.
The (not necessarily cyclic) additive group of points of an elliptic curve defined over a finite field The Elliptic curve discrete logarithm problem or the ECDLP
The Jacobian of a hyperelliptic curve C defined over a finite field The Hyperelliptic curve discrete logarithm problem or the HECDLP

Note that if we are interested in computing indices to a base , we may indeed replace, at least theoretically, G by the subgroup H of G generated by g and may assume, without loss of generality, that G is cyclic. Now, if we know an isomorphism , computing discrete logarithms in G is rather easy (Exercise 4.4). However, computing such an isomorphism is, in general, not an easy task and may demand exponential time and/or storage requirements.

Another problem that is widely believed to be computationally equivalent to the DLP (at least for the groups mentioned in the above table) is called the Diffie–Hellman problem (DHP). Similar to the DLP, the DHP is presumably difficult to solve for the groups , and and one may introduce the specific names DHP, ECDHP and HECDHP to designate this problem applied to these specific groups.

Problem 4.9

Diffie–Hellman problem (DHP) Let G be a multiplicative group and let . Given gx and gy for some (unknown) integers x and y, compute gxy.

Clearly, if a solution of the DLP is given, one may compute y = indg(gy) and, subsequently, gxy = (gx)y. That is, the DHP is no harder than the DLP. A proof for the validity or otherwise of the converse relation between these two problems is not known. It is also widely believed that the DLP is computationally equivalent to the IFP. A complete proof of this equivalence is not known, though certain partial results are available in the literature.

There are some other difficult problems on which cryptographic systems can be built. Problem 4.10 deserves specific mention in this regard.

Problem 4.10

Subset sum problem (SSP) Given a set A := {a1, . . . , an} of natural numbers and , find out if there exist , such that , that is, if there is a subset B of A with the property that . The integers a1, . . . , an are called the weights for the SSP.

The Knapsack problem is a related combinatorial optimization problem. In view of this, the set {a1, . . . , an} is often called a knapsack set, and the SSP is, by an abuse of notation, also referred to as the knapsack problem.

Some of the early cryptographic systems based on the SSP have succumbed to efficient (even polynomial-time) cryptanalytic attacks. However, some schemes have been proposed in the recent years, which seem to be resistant to such attacks, or, in other words, for which good attacks are not yet known. As a result, it is important to study the SSP in some detail.

The SSP is often mapped to problems on lattices. Let v1, . . . , vn be linearly independent vectors in . Consider the set of integer linear combinations of these vectors:

L is called the lattice generated by v1, . . . , vn.

Problem 4.11

Shortest vector problem (SVP) Find a non-zero vector whose length ‖v‖ is smallest in L.

Problem 4.12

Closest vector problem (CVP) Given a vector , find a vector such that the length ‖vw‖ is smallest over all choices of .

For some other difficult computational problems and their applications to cryptography, we refer the reader to the references suggested at the end of this chapter and of Chapter 5.

Exercise Set 4.2

4.1
  1. Let n ≥ 2 be a square-free integer (that is, a product of pairwise distinct primes) and let . Show that the exponentiation map , xxa, is bijective if and only if gcd(a, φ(n)) = 1. [H]

  2. Show that if is not square-free, then for no integer a ≥ 2 the exponentiation map , is bijective. [H]

4.2Show that the following problems are polynomial-time reducible to the IFP.
  1. RSA key inversion problem (RSAKIP) Let n = pq be a product of two (distinct odd) primes p and q. Given with gcd(e, φ(n)) = 1, compute an integer such that ed ≡ 1 (mod φ(n)).

  2. RSA problem (RSAP) Let n and e be as in Part (a). Given , compute such that cxe (mod n). (By Exercise 4.1, such an x exists and is unique.)

  3. Quadratic residuosity problem (QRP) Given an odd integer n > 1 and an integer a with gcd(a, n) = 1, check if a is a quadratic residue modulo n. (Note that if n is a prime, then this problem reduces to the computation of the Legendre symbol . If, on the other hand, n is composite and , one cannot conclude that a is a quadratic residue modulo n.)

4.3Let G be a finite cyclic group of order n and let g, g′ be two arbitrary generators of G.
  1. Show that indg g′ is invertible modulo n and that for every we have indg a ≡ (indg a)(indg g′)–1 (mod n).

  2. Let , m := ord(h) and y := indg h. Show that m = n/gcd(y, n), that y/ gcd(y, n) is invertible modulo m and that for an arbitrary element the index indh a exists if and only if gcd(y, n)| indg a and in that case we have

    indh a ≡ (indg a/ gcd(y, n))(y/ gcd(y, n))–1 (mod m).

4.4Let G be a finite cyclic multiplicatively written group of order n. An algorithm on G is said to be polynomial-time if it runs in time bounded above by a polynomial function of log n. Assume that the product of any two elements in G can be computed in polynomial time. Recall from Exercise 2.47 that . Show that the computation of an isomorphism is polynomial-time equivalent to computing discrete logarithms in G. (That is, assuming that we are given a (two-way) black box that returns in polynomial time or for every and , discrete logarithms in G can be computed in polynomial time. Conversely, if discrete logarithms with respect to a primitive element can be computed in polynomial time, then such a black box can be realized.)
4.5Let p be an (odd) prime and let g be a primitive root modulo p. Show that is a quadratic residue modulo p if and only if the index indg a is even. Hence, conclude that there is a polynomial-time (in log p) algorithm that computes the least significant bit of indg a, given any . More generally, let p – 1 = 2r s, where r, and s is odd. Show that there exists a polynomial-time algorithm that computes the r least significant bits of indg a given any . (This exercise shows that the DLP has a polynomial-time solution for Fermat primes Fn := 22n + 1. Note that Fn is prime for n = 0, 1, 2, 3, 4. No other Fermat primes are known.)

4.3. The Integer Factorization Problem

The integer factorization problem (IFP) (Problems 4.1, 4.2 and 4.3) is one of the most easily stated and yet hopelessly difficult computational problem that has attracted researchers’ attention for ages and most notably in the age of electronic computers. A huge number of algorithms varying widely in the basic strategy, mathematical sophistication and implementation intricacy have been suggested, and, in spite of these, factoring a general integer having only 1000 bits seems to be an impossible task today even using the fastest computers on earth.

It is important to note here that even proving rigorous bounds on the running times of the integer-factoring algorithms is quite often a very difficult task. In many cases, we have to be satisfied with clever heuristic bounds based on one or more reasonable but unprovable assumptions.

This section highlights human achievements in the battle against the IFP. Before going into the details of this account we want to mention some relevant points. Throughout this section we assume that we want to factor a (positive) integer n. Since such an integer can be represented by ⌈lg(n + 1)⌉ bits, the input size is taken to be lg n (or, ln n, or log n). Most modern factorization algorithms take time given by the following subexponential expression in ln n:

L(n, α, c) := exp((c + o(1))(ln n)α(ln ln n)1–α),

where 0 < α < 1 and c > 0 are constants. As described in Section 3.2, the smaller the value of α is, the closer the expression L(n, α, c) is to a polynomial expression (in ln n). If n is understood from the context, we write L(α, c) in place of L(n, α, c). Although the current best-known algorithms correspond to α = 1/3, the algorithms with α = 1/2 are also quite interesting. In this case, we use the shorter notation L[c] := L(1/2, c).

Henceforth we will use, without explicit mention, the notation q1 := 2, q2 := 3, q3 := 5, . . . to denote the sequence of primes. The concept of qt-smoothness (for some ) will often be referred to as B-smoothness, where B = {q1, . . . , qt}. Recall from Theorem 2.21 that smaller integers have higher probability of being B-smooth for a given B. This observation plays an important role in designing integer factoring algorithms. The following special case of Theorem 2.21 is often useful.

Corollary 4.1.

Let , x = O(nα) and y = L[β] = L(n, 1/2, β). Then we have the asymptotic formula .

Before any attempt of factoring n is made, it is worthwhile to check for the primality of n. Since probabilistic primality tests (like Algorithm 3.13) are quite efficient, we should first run one such test before we are sure that n is really composite. Henceforth, we will assume that n is known to be composite.

4.3.1. Older Algorithms

“Factoring in the dark ages” (a phrase attributed to Hendrik Lenstra) used fully exponential algorithms some of which are discussed now. Though the worst-case performances of these algorithms are quite poor, there are many situations when they might factor even a large integer quite fast. It is, therefore, worthwhile to spend some time on these algorithms.

Trial division

A composite integer n admits a factor ≤ , that can be found by trial divisions of n by integers ≤ . This demands trial divisions and is clearly impractical, even when n contains only 30 decimal digits. It is also true that n has a prime divisor ≤ . So it suffices to carry out trial divisions by primes only. Though this modified strategy saves us many unnecessary divisions, the asymptotic complexity does not reduce much, since by the prime number theorem the number of primes ≤ is about . In addition, we need to have a list of primes ≤ or generate the primes on the fly, neither of which is really practical. A trade-off can be made by noting that an integer m ≥ 30 cannot be prime unless m ≡ 1, 7, 11, 13, 17, 19, 23, 29 (mod 30). This means that we need to perform the trial divisions only by those integers m congruent to one of these values modulo 30 and this reduces the number of trial divisions to about 25 per cent. Though trial division is not a practical general-purpose algorithm for factoring large integers, we recommend extracting all the small prime factors of n, if any, by dividing n by a predetermined set {q1, . . . , qt} of small primes. If n is indeed qt-smooth or has all prime factors ≤ qt except only one, then the trial division method completely factors n quite fast. Even when n is not of this type, trial division might reduce its size, so that other algorithms run somewhat more efficiently.

Pollard’s rho method

Pollard’s rho method solves the IFP in an expected O~(n1/4) time and is based on the birthday paradox (Exercise 2.172).

Let be an (unknown) prime divisor of n and let be a random map. We start with an initial value and generate a sequence xi+1 = f(xi), , of elements of . Let yi denote the smallest non-negative integer satisfying yixi (mod p). By the birthday paradox, after iterates x1, . . . , xt are generated, we have a high chance that yi = yj, that is, xixj (mod p) for some 1 ≤ i < jt. This means that p|(xixj) and computing gcd(xixj, n) splits n into two non-trivial factors with high probability. The method fails if this gcd is n. For a random n, this incident of having a gcd equal to n is of very low probability.

Algorithm 4.1 gives a specific implementation of this method. Computing gcds for all the pairs (xixj, n) is a massive investment of time. Instead we store (in the variable ξ) the values xr, r = 2t, for and compute only gcd(xr+sxr, n) for s = 1, . . . , r. Since the sequence yi, , is ultimately periodic with expected length of period , we eventually reach a t with r = 2t ≥ τ. In that case, the for loop detects a match. Typically, the update function f is taken to be f(x) = x2 –1 (mod n), which, though not a random function, behaves like one. Note that the iterates yi, , may be visualized as being located on the Greek letter ρ as shown in Figure 4.1 (with a tail of the first μ iterates followed by a cycle of length τ). This is how this method derives its name.

Figure 4.1. Iterates in Pollard’s rho method


Algorithm 4.1 takes an expected running time . Since , Pollard’s rho method runs in expected time .

Algorithm 4.1. Pollard’s rho method

Input: A composite integer .

Output: A non-trivial factor of n.

Steps:

Choose a random element and set ξ := x and r := 1.

while (1) {
   for s = 1, . . . , r {
       x := f(x).
       d := gcd(x – ξ, n).
       if (1 < d < n) { Return d. }
   }
   ξ := x.
   r := 2r.
}

Many modifications of Pollard’s rho method have been proposed in the literature. Perhaps the most notable one is an idea due to R. P. Brent. All these modifications considerably speed up Algorithm 4.1, though leaving the complexity essentially the same, that is, . We will not describe these modifications in this book.

Pollard’s p – 1 method

Pollard’s p – 1 method is dependent on the prime factors of p – 1 for a prime divisor p of n. Indeed if p – 1 is rather smooth, this method may extract a (non-trivial) factor of n pretty fast, even when p itself is quite large. To start with we extend the definition of smoothness as follows.

Definition 4.1.

Let . An integer x is called y-power-smooth if, whenever a prime power pe divides x, we have pey. Clearly, a y-power-smooth integer is y-smooth, but not necessarily conversely.

Let p be an (unknown) prime divisor of n. We may assume, without loss of generality, that . Assume that p–1 is M-power-smooth. Then (p – 1)| lcm(1, . . . , M) and, therefore, for an integer a with gcd(a, n) = 1 (and hence with gcd(a, p) = 1), we have alcm(1,...,M) ≡ 1 (mod p) by Fermat’s little theorem, that is, d := gcd(alcm(1,...,M) – 1, n) > 1. If dn, then d is a non-trivial factor of n. In case we have d = n (a very rare occurrence), we may try with another a or declare failure.

The problem with this method is that p and so M are not known in advance. One may proceed by guessing successively increasing values of M, till the method succeeds. In the worst case, that is, when p is a safe prime, we have M = (p – 1)/2. Since , this algorithm runs in a worst-case time of . However, if M is quite small, then this algorithm is rather efficient, irrespective of how large p itself is.

In Algorithm 4.2, we give a variant of the p – 1 method, where we supply a predetermined value of the bound M. We also assume that we have at our disposal a precalculated list of all primes q1, . . . , qtM.

There is a modification of this algorithm known as Stage 2 or the second stage. For this, we choose a second bound M′ larger than M. Assume that p – 1 = rq, where r is M-power-smooth and q is a prime in the range M < qM′. In this case, Stage 2 computes with high probability a factor of n after doing an operations as follows. When Algorithm 4.2 returns “failure” at the last step, it has already computed the value A := am (mod n), where , ei = ⌊ln M/ln qi⌋. In this case, A has the multiplicative order of q modulo p, that is, the subgroup H of generated by A has order q. We choose random integers . By the birthday paradox (Exercise 2.172), we have with high probability AliAlj (mod p) for some ij. In that case, d := gcd(Ali – Alj, n) is divisible by p and is a desired factor of n (unless d = n, a case that occurs with a very low probability). In practice, we do not know q and so we determine s and the integers l1, . . . , ls using the bound M′ instead of q.

Algorithm 4.2. Pollard’s p – 1 method

Input: A composite integer , a bound M and all primes q1, . . . , qtM.

Output: A non-trivial factor d of n or “failure”.

Steps:

Select a random integer a, 1 < a < n. /* For example, we may take a := 2 */

if (d := gcd(an) ≠ 1) { Return d. }
for i = 1, . . . , t {
    ei := ⌊ln M/ln qi⌋.
    .
    d := gcd(a – 1, n)
    if (1 < d < n) { Return d. }
    if (d = n) { Return “failure”. }  /* Or repeat the for loop with another a */
    if (d = 1) { Return “failure”. }
}
Return “failure”.

In another variant of Stage 2, we compute the powers Aqt+1 , . . . , Aqt (mod n), where qt+1, . . . , qt are all the primes qj satisfying M < qjM′. If p – 1 = rq is of the desired form, we would find q = qj for some t < jt′, and then gcd(Aq – 1, n), if not equal to n, would be a non-trivial factor of n.

In practice, one may try one’s luck using this algorithm for some M in the range 105M ≤ 106 (and possibly also the second stage with 106M′ ≤ 108) before attempting a more sophisticated algorithm like the MPQSM, the ECM or the NFSM.

Williams’ p + 1 method

As always, we assume that n is a composite integer and that p is an (unknown) prime divisor of n. Pollard’s p – 1 method uses an element a in the group whose multiplicative order is p – 1. The idea of Williams’ p + 1 method is very similar, that is, it works with an element a, this time in , whose multiplicative order is p + 1. If p + 1 is M-power-smooth for a reasonably small bound M, then computing d := gcd(ap+1 – 1, n) > 1 splits n with high probability.

In order to find an element of order p + 1, we proceed as follows. Let α be an integer such that α2 – 4 is a quadratic non-residue modulo p. Then the polynomial is irreducible and . Let a, be the two roots of f. Then ab = 1 and a + b = α. Since f(ap) = 0 (check it!) and since , we have ap = b = a–1, that is, ap+1 = 1.

Unfortunately, p is not known in advance. Therefore, we represent elements of as integers modulo n and the elements of as polynomials c0 + c1X with c0, . Multiplying two such elements of is accomplished by multiplying the two polynomials representing these elements modulo the defining polynomial f(X), the coefficient arithmetic being that of . This gives us a way to do exponentiations in in order to compute am – 1 for a suitable m (for example, m = lcm(1, . . . , M)).

However, the absence of knowledge of p has a graver consequence, namely, it is impossible to decide whether α2 – 4 is a quadratic non-residue modulo p for a given integer α. The only thing we can do is to try several random values of α. This is justified, because if k random integers α are tried, then the probability that for all of these α the integers α2 – 4 are quadratic residues modulo p is only 1/2k.

The code for the p + 1 method is very similar to Algorithm 4.2. We urge the reader to complete the details. Since p3 – 1 = (p – 1)(p2 + p + 1), p4 – 1 = (p2 – 1)(p2 + 1) and so on, we can work in higher extensions like , to find elements of order p2 + p + 1, p2 + 1 and so on, and generalize the p ± 1 methods. However, the integers p2 + p + 1, p2 + 1, being large (compared to p ± 1), have smaller chance of being M-smooth (or M-power-smooth) for a given bound M.

The reader should have recognized why we paid attention to strong primes and safe primes (Definition 3.5, p 199, and Algorithm 3.14, p 200). Let us now concentrate on the recent developments in the IFP arena.

4.3.2. The Quadratic Sieve Method

Carl Pomerance’s quadratic sieve method (QSM) is one of the (reasonably) successful modern methods of factoring integers. Though the number field sieve factoring method is the current champion, there was a time in the recent past when the quadratic sieve method and the elliptic curve method were known to be the fastest algorithms for solving the IFP.

The basic algorithm

We assume that n is a composite integer which is not a perfect square (because it is easy to detect if n is a perfect square and if so, we replace n by ). The basic idea is to reach at a congruence of the form

Equation 4.1


with x ≢ ±y (mod n). In that case, gcd(xy, n) is a non-trivial factor of n.

We start with a factor base B = {q1, . . . , qt} comprising the first t primes and let and J := H2n. Then H and J are each and hence for a small integer c the right side of the congruence

(H + c)2J + 2cH + c2 (mod n)

is also . We try to factor T(c) := J + 2cH + c2 using trial divisions by elements of B. If the factorization is successful, that is, if T(c) is B-smooth, then we get a relation of the form

Equation 4.2


where . (Note that T(c) ≠ 0, since n is assumed not to be a perfect square.) If all αi are even, say, αi = 2βi, then we get the desired Congruence (4.1) with and y = H + c. But this is rarely the case. So we keep on generating other relations. After sufficiently many relations are available, we combine these together (by multiplication) to get Congruence (4.1) and compute gcd(xy, n). If this does not give a non-trivial factor, we try to recombine the collected relations in order to get another Congruence (4.1). This is how Pomerance’s QSM works.

In order to find suitable combinations for yielding Congruence (4.1), we employ a method similar to Gaussian elimination. Assume that we have collected r relations of the form

We search for integers such that the product

is a desired Congruence (4.1). The left side of this congruence is already a square. In order to make the right side a square too, we have to essentially solve the following system of linear congruences modulo 2:

This is a system of t equations over in r unknowns β1, . . . , βr and is expected to have solutions, if r is slightly larger than t. Note that the values of αij modulo 2 are only needed for solving the above linear system. This means that we can have a compact representation of the coefficient matrix (αij) by packing 32 of the coefficients as bits per word. Gaussian elimination (over ) can be done using bit operations only.

The running time of this method can be derived using Corollary 4.1. Note that the integers T(c) that are tested for B-smoothness are O(n1/2) which corresponds to α = 1/2 in the corollary. We take qt = L[1/2] (so that t = L[1/2]/ ln L[1/2] = L[1/2] by the prime number theorem) which corresponds to β = 1/2. Assuming that the integers T(c) behave as random integers of magnitude O(n1/2), the probability that one such T(c) is B-smooth is L[–1/2]. Therefore, if L[1] values of c are tried, we expect to get L[1/2] relations involving the L[1/2] primes q1, . . . , qt. Combining these relations by Gaussian elimination is now expected to produce a non-trivial Congruence (4.1). This gives us a running-time of the order of L[3/2] for the relation collection stage. Gaussian elimination using L[1/2] unknowns also takes asymptotically the same time. However, each T(c) can have at most O(log n) distinct prime factors, implying that Relation (4.2) is necessarily sparse. This sparsity can be effectively exploited and the Gaussian elimination can be done essentially in time L[1]. Nevertheless, the entire procedure runs in time L[3/2], a subexponential expression in ln n.

Sieving

In order to reduce the running time from L[3/2] to L[1], we employ what is known as sieving (and from which the algorithm derives its name). Let us fix a priori the sieving interval, that is, the values of c for which T(c) is tested for B-smoothness, to be –McM, where M = L[1]. Let be a small prime (that is, q = qi for some i = 1, . . . , t). We intend to find out the values of c such that qh|T(c) for small exponents h = 1, 2, . . . . Since T(c) = J + 2cH + c2 = (c + H)2n, the solvability for c of the condition qh|T(c) or of q|T(c) is equivalent to the solvability of the congruence (c + H)2n (mod q). If n is a quadratic non-residue modulo q, no c satisfies the above condition. Consequently, the factor base B may comprise only those primes q for which n is a quadratic residue modulo q (instead of all primes ≤ qt). So we assume that q meets this condition. We may also assume that qn, because it is a good strategy to perform trial divisions of n by all the primes in B before we go for sieving. The sieving process makes use of an array indexed by c. We initialize the array location for each c, –McM.

We explain the sieving process only for an odd prime q. The modifications for the case q = 2 are left to the reader as an easy exercise. The congruence x2n ≡ 0 (mod q) has two distinct solutions for x, say, x1 and mod q. These correspond to two solutions for c of (H + c)2n (mod q), namely, c1x1H (mod q) and (mod q). For each value of c in the interval –McM, that is congruent either to c1 or modulo q, we subtract ln q from . We then lift the solutions x1 and to the (unique) solutions x2 and of the congruence x2n ≡ 0 (mod q2) (Exercise 3.29), compute c2x2H (mod q2) and (mod q2) and for each c in the range –McM congruent to c2 or modulo q2 subtract ln q from . We then again lift to obtain the solutions modulo q3 and proceed as above. We repeat this process of lifting and subtracting ln q from appropriate locations of until we reach a sufficiently large for which neither ch nor corresponds to any value of c in the range –McM. We then choose another q from the factor base and repeat the procedure explained in this paragraph for this q.

After the sieving procedure is carried out for all small primes q in the factor base B, we check for which c, –McM, the array location is 0. These are precisely the values of c in the indicated range for which T(c) is B-smooth. For each smooth T(c), we then compute Relation (4.2) using trial division (by primes of B).

The sieving process replaces trial divisions (of every T(c) by every q) by subtractions (of ln q from appropriate ). This is intuitively the reason why sieving speeds up the relation collection stage. For a more rigorous analysis of the running time, note that in order to get the desired ci and modulo qi for each and for each i = 1, . . . , h we have either to compute a square root modulo q (for i = 1) or to solve a congruence (during lifting for i ≥ 2), each of which can be done in polynomial time. Also the bound h on the exponent of q satisfy , that is, h = O(log n). Finally, there are L[1/2] primes in B. Therefore, the computation of the ci and for all q and i takes a total of L[1/2] time.

Now, we count the total number ν of subtractions of different ln q values from all the locations of the array . The size of is 2M + 1. For each qi, we need to subtract ln q from at most 2 ⌈(2M + 1)/qi⌉ locations (for odd q), and we also have . Therefore, ν is of the order of , where Q is the maximum of all qi and is , and where Hm, , denote the harmonic numbers (Exercise 4.6). But Hm = O(ln m), and so ν = O(2(2M + 1) log n) = L[1], since M = L[1].

The logarithms ln q (as well as the initial array values ln |T(c)|) are irrational numbers and hence need infinite precision for storing. We, however, need to work with only crude approximations of these logarithms, say up to three places after the decimal point. In that case, we cannot take as the criterion for selecting smooth values of T(c), because the approximate representation of logarithms leads to truncation (and/or rounding) errors. In practice, this is not a severe problem, because T(c) is not smooth if and only if it has a prime factor at least as large as qt+1 (the smallest prime not in B). This implies that at the end of the sieving operation the values of for smooth T(c) are close to 0, whereas those for non-smooth T(c) are much larger (close to a number at least as large as ln qt+1). Thus we may set the selection criterion for smooth integers as or as ln qt+1. It is also possible to replace floating point subtraction by integer subtraction by doing the arithmetic on 1000 times the logarithm values. To sum up, the ν = L[1] subtractions the sieving procedure does would be only single-precision operations and hence take a total of L[1] time.

As mentioned earlier, Gaussian elimination with sparse equations can also be performed in time L[1]. So Pomerance’s algorithm with sieving takes time L[1].

Incomplete sieving

Numerous modifications over this basic strategy speed up the algorithm reasonably. One possibility is to do sieving every time only for h = 1 and ignore all higher powers of q. That is, for every q we check which of the integers T(c) are divisible by q and then subtract ln q from the corresponding indices of the array . If some T(c) is divisible by a higher power of q, this strategy fails to subtract ln q the required number of times. As a result, this T(c), even if smooth, may fail to pass the smoothness criterion. This problem can be overcome by increasing the cut-off from 1 (or 0.1 ln qt+1) to a value ξ ln qt for some ξ ≥ 1. But then some non-smooth T(c) will pass through the selection criterion in addition to some smooth ones that could not, otherwise, be detected. This is reasonable, because the non-smooth ones can be later filtered out from the smooth ones and one might use even trial divisions to do so. Experimentations show that values of ξ ≤ 2.5 work quite well in practice.

The reason why this strategy performs well is as follows. If q is small, for example q = 2, we should subtract only 0.693 from for every power of 2 dividing T(c). On the other hand, if q is much larger, say q = 1,299,709 (the 105-th prime), then ln q ≈ 14.078 is large. But T(c) would not, in general, be divisible by a high power of this q. This modification, therefore, leads to a situation where the probability that a smooth T(c) is actually detected as smooth is quite high. A few relations would still be missed out even with the modified selection criterion, but that is more than compensated by the speed-up gained by the method. Henceforth, we will call this modified strategy as incomplete sieving and the original strategy (of considering all powers of q) as complete sieving.

Large prime variation

Another trick known as large prime variation also tends to give more usable relations than are available from the original (complete or incomplete) sieving. In this context, we call a prime qlarge, if q′ ∉ B. A value of T(c) is often expected to be B-smooth except for a single large prime factor:

Equation 4.3


with q′ ∉ B. Such a value of T(c) can be easily detected. For example, incomplete sieving with the relaxed selection criterion is expected to give many such relations naturally, whereas for complete sieving, if the left-over of ln |T(c)| in at the end of the subtraction steps is < 2 ln qt, then this must correspond to a large prime factor < . Instead of throwing away an apparently unusable Equation (4.3), we may keep track of them. If a large prime q′ is not large enough (that is, not much larger than qt), then it might appear on the right side of Equation (4.3) for more than one values of c, and if that is the case, all these relations taken together now become usable for the subsequent Gaussian elimination stage (after including q′ in the factor base). This means that for each large prime occurring more frequently than once, the factor base size increases by 1, whereas the number of relations increases by at least 2. Thus with a little additional effort we enrich the factor base and the relations collected, and this, in turn, increases the probability of finding a useful Congruence (4.1), our ultimate goal. Viewed from another angle, the strategy of large prime variation allows us to start with smaller values of t and/or M and thereby speed up the sieving stage and still end up with a system capable of yielding the desired Congruence (4.1). Note that an increased factor base size leads to a larger system to solve by Gaussian elimination. But this is not a serious problem in practice, because the sieving stage (and not the Gaussian elimination stage) is usually the bottleneck of the running time of the algorithm.

It is natural that the above discussion on handling one large prime is applicable to situations where a T(c) value has more than one large prime factors, say q′ and q″. Such a T(c) value leads to a usable relation if . This situation can be detected by a compositeness test on the non-smooth part of T(c). Subsequently, we have to factor the non-smooth part to obtain the two large primes q′ and q″. This is called two large prime variation. As the size of the integer n to be factored becomes larger, one may go for three and four large prime variations.

We will shortly encounter many other instances of sieving (for solving the IFP and the DLP). Both incomplete sieving and the use of large primes, if carefully applied, help speed up most of these sieving methods much in the same way as they do in connection with the QSM.

The multiple polynomial quadratic sieve

Easy computations (Exercise 4.11) show that the average and maximum of the integers |T(c)| checked for smoothness in the QSM are approximately M H and 2M H respectively. Though these values are theoretically , in practice the factor of M (or 2M) makes the integers |T(c)| somewhat large leading to a poor yield of B-smooth integers for larger values of |c| in the sieving interval. The multiple-polynomial quadratic sieve method (MPQSM) applies a nice trick to reduce these average and maximum values. In the original QSM, we work with a single polynomial in c, namely,

T(c) = J + 2cH + c2 = (H + c)2n.

Now, we work with a more general quadratic polynomial

with W > 0 and V2UW = n. (The original T(c) corresponds to U = J, V = H and W = 1.) Then we have , that is, in this case a relation looks like

This relation has an additional factor of W that was absent in Relation (4.2). However, if W is chosen to be a prime (possibly a large one), then the Gaussian elimination stage proceeds exactly as in the original method. Indeed in this case W appears in every relation and hence poses no problem. Only the integers need to be checked for B-smoothness and hence should have small values. The sieving procedure (that is, computing the appropriate locations of for subtracting ln q, ) for the general polynomial is very much similar to that for T(c). The details are left to the reader as an easy exercise.

Let us now explain how we can choose the parameters U, V, W. To start with we fix a suitable sieving interval and then choose W to be a prime close to such that n is a quadratic residue modulo W. Then we compute a square root V of n modulo W (Algorithm 3.16) and finally take U = (V2n)/W. This choice clearly gives and . (Indeed one may choose 0 < V < W/2, but this is not an important issue.) Now, the maximum value of becomes . Thus even for , this maximum value is smaller by a factor of than the maximum value of |T(c)| in the original QSM. Moreover, we may choose somewhat smaller values of (compared to M) by working with several polynomials corresponding to different choices for the prime W. This is why the MPQSM, despite having the same theoretical running-time (L[1]) as the original QSM, runs faster in practice.

Parallelization

The QSM is highly parallelizable. More specifically, different processors can handle pairwise disjoint subsets of B during the sieving process. That is, each processor P maintains a local array indexed by c, –McM. The (local) sieving process at P starts with initializing all the locations to 0. For each prime q in the subset BP of the factor base B assigned to P, one adds ln q to appropriate locations (and appropriate numbers of times). After all these processors finish local sieving, a central processor computes, for each c in the sieving interval, the value ln (where the sum extends over all processors P which have done local sieving) based on which T(c) is recognized as smooth or not. For the multiple-polynomial variant of the QSM, different processors might handle different polynomials and/or different subsets of B.

TWINKLE: Shamir’s factoring device

Adi Shamir has proposed the complete design of a (hardware) device, TWINKLE (The Weizmann INstitute Key Location Engine), that can perform the sieving stage of QSM a hundred to thousand times faster than software implementations in usual PCs available nowadays. This speed-up is obtained by using a high clock speed (10 GHz) and opto-electronic technology for detecting smooth integers. Each TWINKLE, if mass produced, has an estimated cost of US $5,000.

The working of TWINKLE is described in Figure 4.2. It uses an opaque cylinder of a height of about 10 inches and a diameter of about 6 inches. At the bottom of the cylinder is an array of LEDs,[1] each LED representing a prime in the factor base. The i-th LED (corresponding to the i-th prime qi) emits light of intensity proportional to log qi. The device is clocked and the i-th LED emits light only during the clock cycles c for which qi|T(c). The light emitted by all the active LEDs at a given clock cycle is focused by a lens and a photo-detector senses the total emitted light. If this total light exceeds a certain threshold, the corresponding clock cycle (that is, the time c) is reported to a PC attached to TWINKLE. The PC then analyses the particular T(c) for smoothness over {q1, . . . , qt} by trial division.

[1] An LED (light emitting diode) is an electronic device that emits light, when current passes through it. A GaAs(Gallium arsenide)-based LED emits (infra-red) light of wavelength ~870 nano-meters. In the operational range of an LED, the intensity of emitted light is roughly proportional to the current passing through the LED.

Figure 4.2. Working of TWINKLE


Thus, TWINKLE implements incomplete sieving by opto-electronic means. The major difference between TWINKLE’s sieving and software sieving is that in the latter we used an array of times (the c values) and the iteration went over the set of small primes. In TWINKLE, we use an array of small primes and allow time to iterate over the different values of c in the sieving interval –McM. An electronic circuit in TWINKLE computes for each LED the cycles c at which that LED is expected to emanate light. That is to say that the i-th LED emits light only in the clock cycles c congruent modulo qi to any of the two solutions c1 and of T(c) ≡ 0 (mod qi). Shamir’s original design uses two LEDs for each prime qi, one corresponding to c1, the other to . In that case, each LED emits light at regularly spaced clock cycles and this simplifies the electronic circuitry (at the cost of having twice the number of LEDs).

Another difference of TWINKLE from software sieving is that here we add the log qi values (to zero) instead of subtracting them from log |T(c)|. By Exercise 4.11, the values |T(c)| typically have variations by small constant factors. Taking logs reduces this variation further and, therefore, comparing the sum of the active log qi values for a given c with a fixed predefined threshold (say log M H) independent of c is a neat way of bypassing the computation of all log |T(c)|, –McM. (This strategy can also be used for software sieving.)

The reasons, why TWINKLE speeds up the sieving procedure over software implementations in conventional PCs, are the following:

  1. Silicon-based PC chips at present can withstand clock frequencies on the order of 1 GHz. On the contrary a GaAs-based wafer containing the LED array can be clocked faster than 10 GHz.

  2. There is no need to initialize the array (to log |T(c)| or zero). Similarly at the end, there is no need to compare the final values in all these array locations with a threshold.

  3. The addition of all the log qi values effective at a given c is done instantly by analog optical means. We do not require an explicit electronic adder.

Shamir [269] reports the full details of a VLSI[2] design of TWINKLE.

[2] very large-scale integration

*4.3.3. Factorization Using Elliptic Curves

H. W. Lenstra’s elliptic curve method (ECM) is another modern algorithm to solve the IFP and runs in expected time , where p is the smallest prime factor of n (the integer to be factored). Since , this running time is L[1] = L(n, 1/2, 1): that is, the same as the QSM. However, if p is small (that is, if p = O(nα) for some α < 1/2), then the ECM is expected to outperform the QSM, since the working of the QSM is incapable of exploiting smaller values of p.

As before, let n be a composite natural number having no small prime divisors and let p be the smallest prime divisor of n. For denoting subexponential expressions in ln p, we use the symbol Lp[c] := L(p, 1/2, c), whereas the unsubscripted symbol L[c] stands for L(n, 1/2, c). We work with random elliptic curves

and consider the group of rational points on E modulo p. However, since p is not known a priori, we intend to work modulo n. The canonical surjection allows us to identify the -rational points on E as points on E over . We now define a bound and let B = {q1, . . . , qt} be all the primes smaller than or equal to M, so that by the prime number theorem (Theorem 2.20) #BM/ln . Of course, p is not known in advance, so that M and B are also not known. We will discuss about the choice of M and B later. For the time being, let us assume that we know some approximate value of p, so that M and B can be fixed, at least approximately, ate the beginning of the algorithm.

By Hasse’s theorem (Theorem 2.48, p 106), the cardinality satisfies , that is, ν = O(p). If we make the heuristic assumption that ν is a random integer on the order O(p), then Corollary 4.1 tells us that ν is B-smooth with probability . This assumption is certainly not rigorous, but accepting it gives us a way to analyse the running time of the algorithm.

If random curves are tried, then we expect to find one B-smooth value of ν. In this case, a non-trivial factor of n can be computed with high probability as follows. Define ei := ⌊ln n/ln qi⌋ for i = 1, . . . , t, and , where t is the number of primes in B. If ν is B-smooth, then ν|m and, therefore, for any point we have . Computation of mP involves computation of many sums P1 + P2 of points P1 := (h1, k1) and P2 := (h2, k2). At some point of time, we would certainly compute , that is, P1 = –P2, that is, h1h2 (mod p) and k1 ≡ –k2 (mod p). Since p was unknown, we worked modulo n, that is, the values of h1, h2, k1 and k2 are known modulo n. Let d := gcd(h1h2, n). Then p|d and if dn (the case d = n has a very small probability!), we have the non-trivial factor d of n. The computation of the coordinates of P1 + P2 (assuming P1P2) demands computing the inverse of h1h2 modulo n (Section 2.11.2). However, if d = gcd(h1h2, n) ≠ 1, then this inverse does not exist and so the computation of P1 + P2 fails, and we have a non-trivial factor of n. If ν is B-smooth, then the computation of mP is bound to fail. The basic steps of the ECM are then as shown in Algorithm 4.3.

Algorithm 4.3. Elliptic curve method (ECM)

Input: A composite integer (with no small prime factors).

Output: A non-trivial divisor d of n.

Steps:

while (1) {
   Select a random curve E : Y2 = X3 + aX + b modulo n.
   Choose a point  in .
   Try to compute mP.   /* where m is as defined in the text */
   if (the computation of mP fails) {
       /* We have found a divisor d > 1 of n */
       if (d ≠ n) { Return d. }
   }
}

Before we derive the running time of the ECM, some comments are in order. A random curve E is chosen by selecting random integers a and b modulo n. It turns out that taking a as single-precision integers and b = 1 works quite well in practice. Indeed one can keep on trying the values a = 0, 1, 2, . . . successively. Note that the curve E is an elliptic curve, that is, non-singular, if and only if δ := gcd(n, 4a3 + 27b2) = 1. However, having δ > 1 is an extremely rare occurrence and one might skip the computation of δ before starting the trial with a curve. The choice b = 1 is attractive, because in that case we may take the point P = (0, 1). In Section 3.6, we have described a strategy to find a random point on an elliptic curve over a field K. This is based on the assumption that computing square roots in K is easy. The same method can be applied to curves over , but n being composite, it is difficult to compute square roots modulo n. So taking b to be 1 (or the square of a known integer) is indeed a pragmatic decision. After all, we do not need P to be a random point on E.

Recall that we have taken , where ei = ⌊ln n/ln qi⌋. If instead we take ei := ⌊ln M/ln qi⌋ (where M is the bound mentioned earlier), the computation of mP per trial reduces much, whereas the probability of a successful trial (that is, a failure of computing mP) does not decrease much. The integer m can be quite large. One, however, need not compute m explicitly, but proceed as follows: first take Q0 := P and subsequently for each i = 1, . . . , t compute . One finally gets mP = Qt.

Now comes the analysis of the running time of the ECM. We have fixed the parameter M to be , so that B contains small primes. The most expensive part of a trial with a random elliptic curve is the (attempted) computation of the point mP. This involves additions of points. Since an expected number of elliptic curves needs to be tried for finding a non-trivial factor of n, the algorithm performs an expected number of additions of points on curves modulo n. Since each such addition can be done in polynomial time, the announced running time follows.

Note that is the optimal running time of the ECM and can be shown to be achieved by taking . But, in practice, p is not known a priori. Various ad hoc ways may be adopted to get around with this difficulty. One possibility is to use the worst-case bound . For example, for factoring integers of the form n = pq, where p and q are primes of roughly the same size, this is a good approximation for p. Another strategy is to start with a small value of M and increase M gradually with the number of trials performed. For larger values of M, the probability of a successful trial increases implying that less number of elliptic curves needs to be tried, whereas the time per iteration (that is, for the computation of mP) increases. In other words, the total running time of the ECM is apparently not very sensitive to the choice of M.

A second stage can be used for each elliptic curve in order to increase the probability of a trial being successful. A strategy very similar to the second stage of the p – 1 method can be employed. The reader is urged to fill out the details. Employing the second stage leads to reasonable speed-up in practice, though it does not affect the asymptotic running time.

The ECM can be effectively parallelized, since different processors can carry out the trials, that is, computations of mP (together with the second stage) with different sets of (random) elliptic curves.

**4.3.4. The Number Field Sieve Method

The number field sieve method (NFSM) is till date the most successful of all integer factoring algorithms. Under certain heuristic assumptions it achieves a running time of the form L(n, 1/3, c), which is better than the L(n, 1/2, c′) algorithms described so far. The NFSM was first designed for integers of a special form. This variant of the NFSM is called the special NFS method (SNFSM) and was later modified to the general NFS method (GNFSM) that can handle arbitrary integers. The running time of the SNFSM has c = (32/9)1/3 ≈ 1.526, whereas that for the GNFSM has c = (64/9)1/3 ≈ 1.923. For the sake of simplicity, we describe only the SNFSM in this book (see Cohen [56] and Lenstra and Lenstra [165] for further details).

We choose an integer and a polynomial such that f(m) ≡ 0 (mod n). We assume that f is irreducible in ; otherwise a non-trivial factor of f yields a non-trivial factor of n. Consider the number field . Let d := deg f be the degree of the number field K. We use the complex embedding for some root of f. The special NFS method makes certain simplifying assumptions:

  1. f is monic, so that .

  2. is monogenic.

  3. is a PID.

Consider the ring homomorphism

This is well-defined, since f(m) ≡ 0 (mod n). We choose small coprime (rational) integers a, b and note that Φ(a+bα) = a + bm (mod n). Let be a predetermined smoothness bound. Assume that for a given pair (a, b), both a + bm and a + bα are B-smooth. For the rational integer a + bm, this means

being the set of all rational primes ≤ B. On the other hand, smoothness of the algebraic integer a + bα means that the principal ideal is a product of prime ideals of prime norms ≤ B; that is, we have a factorization

where is the set of all prime ideals of of prime norms ≤ B. By assumption, each is a principal ideal. Let denote a set of generators, one for each ideal in . Further let denote a set of generators of the multiplicative group of units of . The smoothness of a + bα can, therefore, be rephrased as

Equation 4.4


Applying Φ then yields

This is a relation for the SNFSM. After relations are available, Gaussian elimination modulo 2 (as in the case of the QSM) is expected to give us a congruence of the form

x2y2 (mod n),

and gcd(xy, n) is possibly a non-trivial factor of n. This is the basic strategy of the SNFSM. We clarify some details now.

Selecting the polynomial f(X)

There is no clearly specified way to select the polynomial f for defining the number field . We require f to have small coefficients. Typically, m is much smaller than n and one writes the expansion of n in base m as n = btmt + bt–1mt–1 + ··· + b1m + b0 with 0 ≤ bi < m. Taking f(X) = btXt + bt–1Xt–1 + ··· + b1X + b0 is often suggested.

For integers n of certain special forms, we have natural choices for f. The seminal paper on the NFSM by Lenstra et al. [167] assumes that n = res for a small integer and a non-zero integer s with small absolute value. In this case, one first chooses a small extension degree and sets m := re/d and f(X) := Xdsre/dde. Typically, d = 5 works quite well in practice. Lenstra et al. report the implementation of the SNFSM for factoring n = 3239 – 1. The parameters chosen are d = 5, m = 348 and f(X) = X5 – 3. In this case, is monogenic and a PID.

Construction of

Take a small rational prime . From Section 2.13, it follows that if is the factorization of the canonical image of f(X) modulo p, then , i = 1, . . . , r, are all the primes lying over p. We have also seen that , , is prime if and only if di = 1, that is, for some . Thus, each root of in corresponds to a prime ideal of of prime norm p.

To sum up, a prime ideal in of prime norm is specified by a pair (p, cp) of values (in ). We denote this ideal by . All ideals in can be precomputed by finding the roots of the defining polynomial f(X) modulo the small primes pB. One can use the root-finding algorithms of Exercise 3.29.

Construction of

Constructing a set of generators of ideals in is a costly operation. We have just seen that each prime ideal of corresponds to a pair (p, cp) and is a principal ideal by assumption. A generator of such an ideal is an element of the form , , with N(gp,cp) = ±p and (mod p). Algorithm 4.4 (quoted from Lenstra et al. [167]) computes the generators gp,cp for all relevant pairs (p, cp). The first for loop exhaustively searches over all small polynomials h(α) in order to locate for each (p, cp) an element of norm kp with |k| as small as possible. If the smallest k (stored in ap,cp) is ±1, is already a generator gp,cp of , else some additional adjustments need to be performed.

Algorithm 4.4. Construction of generators of ideals for the SNFSM

Choose two suitable positive constants aB and CB (depending on B and K).

Initialize an array ap,cp := aB indexed by the relevant pairs (p, cp).

for each with , N(h) = kpp ≤ B,
     \ {0}, |k| < min(paB) {
    Find cp such that .    /* Root finding */
    if (|k| < |ap,cp|) {
       /* Store the least k and the corresponding h found so far */
       ap,cp := k.
    }
}
for each relevant pair (pcp) {
    if (ap,cp = ±1)    /* The more frequent case */
    else {
       Locate a  with N(g) = ap,cp.
       .
    }
}

Construction of

Let K have the signature (r1, r2). Write ρ = r1 + r2 – 1. By Dirichlet’s unit theorem, the group of units of is generated by an appropriate root u0 of unity and ρ multiplicatively independent[3] elements u1, . . . , uρ of infinite order. Each unit u of has norm N(u) = ±1. Thus, one may keep on generating elements , hi small integers, of norm ±1, until ρ independent elements are found. Many elements of are available as a by-product during the construction of , which involves the computation of norms of many elements in . For a more general exposition on this topic, see Algorithm 6.5.9 of Cohen [56].

[3] The elements u1, . . . , uρ in a (multiplicatively written) group are called (multiplicatively) independent if , , is the group identity only for n1 = ··· = nρ = 0.

Computing the factorization of a + bα

In order to compute the factorization of Equation (4.4), we first factor the integer N(a + bα) = bdf(–a/b). If is the prime factorization of 〈a + bα〉 with pairwise distinct prime ideals of , by the multiplicative property of norms we obtain .

Now, let pB be a small prime. If pN(a + bα), it is clear that no prime ideal of of norm p (or a power of p) appears in the factorization of 〈a + bα〉. On the other hand, if p| N(a + bα), then for some . The assumption implies that the inertial degree of is 1: that is, , that is, , that is, there is a cp with f(cp) ≡ 0 (mod p) such that the prime ideal corresponds to the pair (p, cp). In this case, we have a ≡ –cpb (mod p). Assume that another prime ideal of norm p appears in the prime factorization of 〈a + bα〉. If corresponds to the pair p, , then . Since cp and are distinct modulo p, it follows that p|gcd(a, b), a contradiction, since gcd(a, b) = 1. Thus, a unique ideal of norm p appears in the factorization of 〈a + bα〉. Moreover, the multiplicity of in the factorization of 〈a + bα〉 is the same as the multiplicity vp(N(a + bα)).

Thus, one may attempt to factorize N(a + bα) using trial divisions by primes ≤ B. If the factorization is successful, that is, if N(a + bα) is B-smooth, then for each prime divisor p of N(a + bα) we find out the ideal and its multiplicity in the factorization of 〈a + bα〉 as explained above. Since we know a generator of each , we eventually compute a factorization , where u is a unit in . What remains is to factor u as a product of elements of . We don’t discuss this step here, but refer the reader to Lenstra et al. [167].

Sieving

In the QSM, we check the smoothness of a single integer T(c) per trial, whereas for the NFS method we do so for two integers, namely, a + bm and N(a + bα). However, both these integers are much smaller than T(c), and the probability that they are simultaneously smooth is larger than the probability that T(c) alone is smooth. This accounts for the better asymptotic performance of the NFS method compared to the QSM.

One has to check the smoothness of a + bm and N(a + ) for each coprime a, b in a predetermined interval. This check can be carried out efficiently using sieves. We have to use two sieves, one for filtering out the non-smooth a + bm values and the other for filtering out the non-smooth a + bα values. We should have gcd(a, b) = 1, but computing gcd(a, b) for all values of a and b is rather costly. We may instead use a third sieve to throw away the values of a for a given b for which gcd(a, b) is divisible by primes ≤ B. This still leaves us with some pairs (a, b) for which gcd(a, b) > 1. But this is not a serious problem, since such values are small in number and can be later discarded from the list of pairs (a, b) selected by the smoothness test.

We fix b and allow a to vary in the interval –MaM for a predetermined bound M. We use an array indexed by a. Before the first sieve we initialize this array to . We may set for those values of a for which gcd(a, b) is known to be > 1 (where +∞ stands for a suitably large positive value). For each small prime pB and small exponent , we compute a′ := –mb (mod ph) and subtract ln p from for each a, –MaM, with aa′ (mod ph). Finally, for each value of a for which is not (close to) 0, that is, for which a + mb is not B-smooth, we set . For the other values of a, we set . One may use incomplete sieving (with a liberal selection criterion) during the first sieve.

The second sieve proceeds as follows. We continue to work with the value of b fixed before the first sieve and with the array available from the first sieve. For each prime ideal , we compute a″ := –bcp (mod p) and subtract ln p from each location for which aa″ (mod p). For those a for which ln B for some real ξ ≥ 1, say ξ = 2, we try to factorize a + bα over and . If the attempt is successful, both a + bm and a + bα are smooth. This second sieve is an incomplete one and, therefore, we must use a liberal selection criterion.

The running time of the SNFSM

For deriving the running time of the SNFSM, take d ≤ (3 ln n/(2 ln ln n))1/3, m = L(n, 2/3, (2/3)1/3), B = L(n, 1/3, (2/3)2/3) and M = L(n, 1/3, (2/3)2/3). From the prime number theorem and from the fact that d is small, it follows that both and have the same asymptotic bound as B. Also meets this bound. We then have L(n, 1/3, (2/3)2/3) unknown quantities on which we have to do Gaussian elimination.

The integers a + mb have absolute values ≤ L(n, 2/3, (2/3)1/3). If the coefficients of f are small, then

|N(a + bα)| = |bdf(–a/b)| ≤ L(n, 1/3, d · (2/3)2/3) = L(n, 2/3, (2/3)1/3).

Under the heuristic assumption that a + mb and N(a + bα) behave as random integers of magnitude L(n, 2/3, (2/3)1/3), the probability that both these are B-smooth turns out to be L(n, 1/3, –(2/3)2/3), and so trying L(n, 1/3, 2(2/3)2/3) pairs (a, b) is expected to give us L(n, 1/3, (2/3)2/3) relations. The entire sieving process takes time L(n, 1/3, 2(2/3)2/3), whereas solving a sparse system in L(n, 1/3, (2/3)2/3) unknowns can be done essentially in the same time. Thus the running time of the SNFSM is L(n, 1/3, 2(2/3)2/3) = L(n, 1/3, (32/9)1/3).

Exercise Set 4.3

4.6For , define the harmonic numbers . Show that for each we have ln(m + 1) ≤ Hm ≤ 1 + ln m. [H] Deduce that the sequence Hm, , is not convergent. (Note, however, that the sequence Hm – ln m, , converges to the constant γ = 0.57721566 . . . known as the Euler constant. It is not known whether γ is rational or not.)
4.7Let k, c, c′, α be positive constants with α < 1. Prove the following assertions.
  1. .

  2. L(n, α, c)L(n, α, c′) is of the form L(n, α, c + c′).

  3. (ln n)kL(n, α, c) is again of the form L(n, α, c).

  4. L(n, α, c)nk is of the form nk+o(1).

4.8Let us assume that an adversary C has computing power to carry out 1012 floating point operations (flops) per second. Let A be an algorithm that computes a certain function P(n) using T(n) flops for an input . We say that it is infeasible for C to compute P(n) using algorithm A, if it takes ≥ 100 years for the computation or, equivalently, if T(n) ≥ 3.1536 × 1021. Find, for the following expressions of T(n), the smallest values of n that make the computation of P(n) by Algorithm A infeasible: T(n) = (ln n)3, T(n) = (ln n)10, T(n) = n, , T(n) = n1/4, T(n) = L[2], T(n) = L[1], T(n) = L[0.5], T(n) = L(n, 1/3, 2) and T(n) = L(n, 1/3, 1). (Neglect the o(1) terms in the definitions of L( ) and L[ ].)
4.9Let be an odd integer and let r be the total number of distinct (odd) prime divisors of n. Show that for each integer a the congruence x2a2 (mod n) has ≤ 2r solutions for x modulo n. If gcd(a, n) = 1, show that this congruence has exactly 2r solutions. [H]
4.10Show that the problems IFP and SQRTP are probabilistic polynomial-time equivalent. [H]
4.11In this exercise, we use the notations introduced in connection with the Quadratic Sieve method for factoring integers (Section 4.3.2). We assume that MH, since , whereas M = L[1].
  1. Show that J ≤ 2H – 1.

  2. Prove that the average of the integers |T(c)|, –McM, is and that the maximum of the same integers is |T(M)| = J + 2MH + M2J + 2MH.

  3. Prove that the average and the maximum of the integers |T(c)|, 0 ≤ c ≤ 2M, are respectively J + 2MH + M(4M + 1)/3 ≈ J + 2MH and |T(2M)| = J + 4MH + 4M2J + 4MH.

  4. Conclude that it is better to choose the sieving interval as –McM instead of as 0 ≤ c ≤ 2M.

4.12

Reyneri’s cubic sieve method (CSM) Suppose that we want to factor an odd integer n. Suppose also that we know a triple (x, y, z) of integers satisfying x3y2z (mod n) with x3y2z (as integers). We assume further that |x|, |y|, |z| are all O(pξ) for some ξ, 1/3 < ξ < 1/2.

  1. Show that for integers a, b, c with a + b + c = 0 one has

    (x + ay)(x + by)(x + cy) ≡ y2T(a, b, c) (mod n),

    where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (zc2x). If x, y, z = O(pξ), then T(a, b, c) is O(pξ) for small values of a, b, c.

  2. Let . Choose a factor base comprising all primes q1, . . . , qt with t = L[α] together with the integers x + ay, –MaM, M = L[α]. The size of the factor base is then L[α].

    If T(a, b, c) with –Ma, b, cM and a + b + c = 0 is qt-smooth, we get a relation for the CSM. Show that trying out the L[2α] pairs (a, b, c) gives us a set of linear congruences of the desired size under the heuristic assumption that the T(a, b, c) values behave as random integers on the order of pξ.

  3. Propose a strategy how these linear congruences can be combined (by Gaussian elimination) to get a quadratic congruence of the form u2v2 (mod n).

  4. Design a sieve for checking the smoothness of the expressions T(a, b, c). [H]

  5. Show that the running time of the CSM is . Since ξ < 1/2, the CSM is more efficient than the QSM. For ξ ≈ 1/3, the running time is .

    (Remark: It is not known how we can efficiently obtain a solution of x3y2z (mod n) with x3y2z and |x|, |y|, |z| = O(pξ), ξ being as small as possible. For some particular values of n, say, for n of the form x3z with small z, a solution is naturally available.)

4.13Sieve of Eratosthenes Two hundred years before Christ, Eratosthenes proposed a sieve (Algorithm 4.5) for computing all primes between 1 and a positive integer n. Prove the correctness of this algorithm and compute its running time. [H]
Algorithm 4.5. The sieve of Eratosthenes

Initialize to zero an array A indexed 2, . . . , n.
for 
   if (Ak = 0) { for l = 2, . . . , ⌊n/k⌋ { Alk := 1. } }
}
for k = 2, . . . , n { if (Ak = 0) { Print “k is a prime”. } }

4.14This exercise proposes an adaptation of the sieve of Eratosthenes for computing a random prime of a given bit length l. In Section 3.4.2, we have described an algorithm for this computation, that generates random (odd) integers of bit length l and checks the primality of each such integer, until a (probable) prime is found. An alternative strategy is to generate a random l-bit odd integer n and check the integers n, n + 2, n + 4, . . . for primality.
  1. Use sieving to design an algorithm that generalizes this second strategy in the sense that it checks for primality only those integers n + r, r = 0, 1, 2, . . . , M, n a random l-bit integer, which are not divisible by the first t primes. In practice, the values 100 ≤ t ≤ 10,000 and M = 10l work quite well. For cryptographic sizes, sieving typically speeds up the generation of naive primes 10 to 100 times.

  2. Generalize the sieve of Part (a) for the computation of safe and strong primes.

4.4. The Finite Field Discrete Logarithm Problem

The discrete logarithm problem (DLP) has attracted somewhat less attention of the research community than the IFP has done. Nonetheless, many algorithms exist to solve the DLP, most of which are direct adaptations of algorithms for solving the IFP. We start with the older algorithms collectively known as the square root methods, since the worst-case running time of each of these is for the field . The new family of algorithms based on the index calculus method provides subexponential solutions to the DLP and is described next. For the sake of simplicity, we assume in this section that we want to compute the discrete logarithm indg a of with respect to a primitive element g of . We concentrate only on the fields , p odd prime, and , , since non-prime fields of odd characteristics are only rarely used in cryptography.

4.4.1. Square Root Methods

Square root methods are applicable to any finite (cyclic) group. To avoid repetitions we provide here a generic description. That is, we assume that G is a multiplicatively written group of order n and . The identity of G is denoted as 1. It is not necessary to assume that G is cyclic or that g is a generator of G. However, these assumptions are expected to make the descriptions of the algorithms somewhat easier and hence we will stick to them. The necessary modifications for non-cyclic groups G or non-primitive elements g are rather easy and the reader is requested to fill out the details. We assume that each element of G can be represented by O(lg n) bits (so that the input size is taken to be lg n) and that multiplications, exponentiations and inverses in G can be computed in time polynomially bounded by this input size.

Shanks’ baby-step–giant-step method

Let us assume that the elements of G can be (totally) ordered in such a way that comparing two elements of G with respect to this order can be done in polynomial time of the input size. For example, a natural order on is the relation ≤ on . Note that k elements of G can be sorted (under the above order) using O(k log k) comparisons.

Let . Then d := indg a is uniquely determined by two (nonnegative) integers d0, d1 < m such that d = d0 + d1m (the base m representation of d). In Shanks’ baby-step–giant-step (BSGS) method, we compute d0 and d1 as follows. To start with we compute a list of pairs (d0, gd0) for d0 = 0, 1, . . . , m – 1 and store these pairs in a table sorted with respect to the second coordinate (the baby steps). Now, for each d1 = 0, 1, . . . , m – 1, we compute gmd1 (the giant steps) and search if agmd1 is the second coordinate of a pair (d0, gd0) of some entry in the table mentioned above. If so, we have found the desired d0 and d1, otherwise we try with the next value of d1. An optimized implementation of this strategy is given as Algorithm 4.6.

The computation of all the elements of T and sorting T can be done in time O~(m). If we use a binary search algorithm (Exercise 4.15), then the search for h in T can be performed using O(lg m) comparisons in G. Therefore, the giant steps also take a total running time of O~(m). Since , the BSGS method runs in time . The memory requirement of the BSGS (that is, of the table T) is . Thus this method becomes impractical, even when n contains as few as 30 decimal digits.

Pollard’s rho method

Pollard’s rho method for solving the DLP is similar in idea to the method with the same name for solving the IFP. Let be a random map and let us generate a sequence of tuples , , starting with a random (r1, s1) and subsequently computing (ri+1, si+1) = f(ri, si) for each i = 1, 2, . . . . The elements for i = 1, 2, . . . can then be thought of as randomly chosen ones from G. By the birthday paradox (Exercise 2.172), we expect to get a match bi = bj for some ij, after of the elements b1, b2, . . . are generated. But then we have arirj = gsjsi, that is, indg a ≡ (rirj)–1(sjsi) (mod n), provided that the inverse exists, that is, gcd(rirj, n) = 1. The expected running time of this algorithm is , the same as that of the BSGS method, but the storage requirement drops to only O(1) elements of G.

Algorithm 4.6. Shanks’ baby-step–giant-step method

Input: G, g and a as described above.

Output: d = indg a.

Steps:

n : = ord(G), .

/* Baby steps */

Initialize T to an empty table.

Insert the pairs (0, 1) and (1, g) in T.

h := g.
for d0 = 2, . . . , m – 1 {
    h := hg.
    Insert (d0hin T.
}
sort T with respect to the second coordinate.

/* Giant steps */
h := al := (g–1)m.
for d1 = 0, . . . , m – 1 {
    if T contains an entry (d0h) { Return d := d0 + d1m. }
    h := hl.
}

The Pohlig–Hellman method

The Pohlig–Hellman (PH) method assumes that the prime factorization of n = ord is known. Since d := indg a is unique modulo n, we can easily compute d using the CRT from a knowledge of d modulo , j = 1, . . . , r. So assume that p is a prime dividing n and that . Let d0 + d1p + ··· + dα–1pα–1, , be the p-ary representation of d modulo pα. The p-ary digits d0, d1, . . . , dα–1 can be successively computed as follows.

Let H be the subgroup of G generated by h := gn/p. We have ord H = p (Exercise 2.44). For the computation of di, , from the knowledge of d0, . . . , di–1, consider the element

But ord(gn/pi+1) = pi+1, so that

Thus, and di = indh b, that is, each di can be obtained by computing a discrete logarithm in the group H of order p (using the BSGS method or the rho method).

From the prime factorization of n, we see that the computations of d modulo for all j = 1, . . . , r can be done in time , q being the largest prime factor of n, since αj and r are O(log n). Combining the values of d modulo by the CRT can be done in polynomial time (in log n). In the worst case, q = O(n) and the PH method takes time which is fully exponential in the input size log n. But if q (or, equivalently, all the prime divisors p1, . . . , pr of n) are small, then the PH method runs quite efficiently. In particular, if q = O((log p)c) for some (small) constant c, then the PH method computes discrete logarithms in G in polynomial time. This fact has an important bearing on the selection of a group G for cryptographic applications, namely, n = ord G is required to have a suitably large prime divisor, so that the PH method cannot compute discrete logarithms in G in feasible time.

4.4.2. The Index Calculus Method

The index calculus method (ICM) is not applicable to all (cyclic) groups. But whenever it applies, it usually leads to the fastest algorithms to solve the DLP. Several variants of the ICM are used for prime finite fields and also for finite fields of characteristic 2. On such a field they achieve subexponential running times of the order of L(q, 1/2, c) = L[c] or L(q, 1/3, c) for positive constants c. We start with a generic description of the ICM. We assume that g is a primitive element of and want to compute d := indg a for some .

To start with we fix a suitable subset B = {b1, . . . , bt} of of small cardinality, so that a reasonably large fraction of the elements of can be expressed easily as products of elements of B. We call B a factor base. In the ICM, we search for relations of the form

Equation 4.5


for integers α, β, γi and δi. This gives us a linear congruence

Equation 4.6


The ICM proceeds in two[4] stages. In the first stage, we compute di := indg bi for each element bi in the factor base B. For that, we collect Relation (4.5) with β = 0. When sufficiently many relations are available, the corresponding system of linear Congruences (4.6) is solved mod q – 1 for the unknowns di. In the second stage, a single relation with gcd(β, q – 1) = 1 is found. Substituting the values of di available from the first stage yields indg a.

[4] Some authors prefer to say that the number of stages in the ICM in actually three, because they decouple the congruence-solving phase from the first stage. This is indeed justified, since implementations by several researchers reveal that for large fields this linear algebra part often demands running time comparable to that needed by the relation collection part. Our philosophy is to call the entire precomputation work the first stage. Now, although it hardly matters, it is up to the reader which camp she wants to join.

Note that as long as q (and g) are fixed, we don’t have to carry out the first stage every time the discrete logarithm of an element of is to be computed. If the values di, i = 1, . . . , t, are stored, then only the second stage needs to be carried out for computing the indices of any number of elements of . This is the reason why the first stage of the ICM is often called the precomputation stage.

In order to make the algorithm more concrete, we have to specify:

  1. how to choose a factor base B;

  2. how to find Relation (4.5);

  3. how to solve a linear system of congruences modulo q – 1 (in particular, when the system is sparse).

In the rest of this section, we describe variants of the ICM based on their strategies for selecting the factor base and for collecting relations. We discuss the third issue in Section 4.7.

4.4.3. Algorithms for Prime Fields

Let be a finite field of prime cardinality. For cryptographic applications, p should be quite large, say, of length around thousand bits or more, and so naturally p is odd. Elements of are canonically represented as integers between (and including) 0 and p–1. The equality x = y in means equality of two integers in the range 0, . . . , p–1, whereas xy (mod p) means that the two integers x and y may be different, but their equivalence classes in are the same.

The basic ICM

In the basic version of the ICM, we choose the factor base B to comprise the first t primes q1, . . . , qt, where t = L[ζ]. (The optimal value of ζ is determined below.) In the first stage, we choose random values of and compute gα. Any integer representing gα can be considered, but we think of as an integer in {1, . . . , p – 1}. We then try to factorize gα using trial divisions by elements of the factor base B. If gα is found to be B-smooth, then we get a desired relation for the first stage, namely,

If gα is not B-smooth, we try another random α and proceed as above. After sufficiently many relations are available, we solve the resulting system of linear congruences modulo p – 1. This gives us di := indg qi for i = 1, . . . , t.

In the second stage, we again choose random integers α and try to factorize agα completely over B. Once the factorization gets successful, that is, we have , we compute .

In order to optimize the running time, we note that the relation collection phase of the first stage is usually the bottleneck of the algorithm. If ζ (or equivalently t) is chosen to be too small, then finding B-smooth integers would be very difficult. On the other hand, if ζ is too large, then we have to collect too many relations to have a solvable linear system of congruences. More explicitly, since the integers gα can be regarded as random integers of the order of p, the probability that gα is B-smooth is (Corollary 4.1). Thus we expect to get each relation after random values of α are tried. Since for each α we need to carry out L[ζ] divisions by elements of the factor base B (the exponentiation gα can be done in polynomial time and hence can be neglected for this analysis), each relation can be found in expected time . Now, in order to solve for di, i = 1, . . . , t, we must have (slightly more than) t = L[ζ] relations. Thus, the relation collection phase takes a total time of . It can be easily checked that is minimized for ζ = 1/2. This gives a running time of L[2] for the relation collection phase.

Since each gα is a positive integer less than p, it is evident that it can have at most O(log p) prime divisors. In other words, the congruences collected are necessarily sparse. As we will see later, such a system can be solved in time O~(t2), that is, in time L[1] for ζ = 1/2.

In the second stage, it is sufficient to have a single relation to compute d = indg a. As explained before, such a relation can be found in expected time . Thus the total running time of the basic ICM is L[2].

The second stage of the basic ICM is much faster than the first stage. In fact, this is a typical phenomenon associated with most variants of the ICM. Speeding up the first stage is, therefore, our primary concern.

Each step in the search for relations consists of an exponentiation (gα) modulo p followed by trial divisions by q1, . . . , qt. Now, gα may be non-smooth, but gα + kp (integer sum) may be smooth for some . Once gα is computed and found to be non-smooth, one can check for the smoothness of gα + kp for k = ±1, ±2, . . ., before another α is tried. Since these integers are available by addition (or subtraction) only (which is much faster than exponentiation), this strategy tends to speed up the relation collection phase. Moreover, information about the divisibility of gα + kp by qi can be obtained from that of gα + (k – 1)p by qi. So using suitable tricks one might reduce the cost of trial divisions. Two such possibilities are explored in Exercise 4.18. Though these modifications lead to some speed-up in practice, they have the disadvantage that as |k| increases, the size of |gα+kp| also increases, so that the chance of getting smooth candidates reduces, and therefore using high values of k does not effectively help.

There are other heuristic modification schemes that help us gain some speed-up in practice. For example, the large prime variation as discussed in connection with the QSM applies equally well here. Another trick is to use the early abort strategy. A random B-smooth integer has higher probability of having many small prime factors rather than a few large prime factors. This observation can be incorporated in the smoothness tests as follows. Let us assume that we do trial divisions by the small primes in the order q1, q2, . . . , qt. After we do trial divisions of a candidate x by the first t′ < t primes (say, t′ ≈ t/2), we check how far we have been able to reduce x. If the reduction of x is already substantial, we continue with the trial divisions by the remaining primes qt′+1, . . . , qt. In the other case, we abort the smoothness test for x and try another candidate. Obviously, this strategy prematurely rejects some smooth candidates (which are anyway rather small in number), but since most candidates are expected to be non-smooth, it saves a lot of trial divisions in the long run. Determination of t′ and/or the quantization of “substantial” reduction actually depend on practical experiences. With suitable choices one may expect to get a speed-up of about 2. The drawback with the early abort strategy is that it often does not go well with sieving. Sieving, whenever applicable, should be given higher preference.

To sum up, the basic ICM and all its modifications can be used for computing discrete logarithms only in small fields, say, of size ≤ 80 bits. For bigger fields, we need newer ideas.

The linear sieve method

The linear sieve method (LSM) is a direct adaptation of the quadratic sieve method for factoring integers (Section 4.3.2). In the basic ICM just discussed, we try to find smooth integers from candidates that are on an average as large as O(p). The LSM, on the other hand, finds smooth ones from a pool of integers each of which is . As a result, we expect to have a higher density of smooth integers among the candidates tested in the LSM than those in the basic method. Furthermore, the LSM employs sieving techniques instead of trial divisions. All these help LSM achieve a running time of L[1], a definite improvement over the L[2] performance of the basic method.

Let and J := H2p. Then . Let’s consider the congruence

Equation 4.7


For small integers c1, c2, the right side of the above congruence, henceforth denoted as

T(c1, c2) := J + (c1 + c2)H + c1c2,

is of the order of . If the integer T(c1, c2) is smooth with respect to the first t primes q1, q2, . . . , qt, that is, if we have a factorization like , then we have a relation

For the linear sieve method, the factor base comprises primes less than L[1/2] (so that t = L[1/2] by the prime number theorem) and integers H + c for –McM. The bound M on c is chosen to be of the order of L[1/2]. Each T(c1, c2), being in absolute value, has a probability of L[–1/2] for being qt-smooth. Thus once we check the factorization of T(c1, c2) for all (that is, for a total of L[1]) values of the pair (c1, c2) with –Mc1c2M, we expect to get L[1/2] Relations (4.7) involving the unknown indices of the factor base elements. If we further assume that the primitive element g is a small prime which itself is in the factor base, then we get a free relation indg g = 1. The resulting system is then solved to compute the discrete logarithms of elements in the factor base. This is the basic principle for the first stage of the LSM.

If we compute all T(c1, c2) and use trial divisions by q1, . . . , qt to separate out the smooth ones, we achieve a running time of L[1.5], as can be easily seen. Sieving is employed to reduce the running time to L[1]. First one fixes a and initializes to ln |T(c1, c2)| an array indexed by c2 in the range c1c2M. One then computes for each prime power qh, q being a small prime in the factor base and h a small positive exponent, a solution for c2 of the congruence (H + c1)c2 + (J + c1H) ≡ 0 (mod qh).

If gcd(H + c1, q) = 1, that is, if H + c1 is not a multiple of q, then the solution is given by σ ≡ –(J + c1H)(H + c1)–1 (mod qh). The inverse in the last congruence can be calculated by running the extended gcd algorithm (Algorithm 3.8) on H + c1 and qh. Then for each value of c2 (in the range c1c2M) that is congruent to σ (mod qh), ln q is subtracted from the array location .

If q|(H + c1), we find out h1 := vq(H + c1) > 0 and h2 := vq(J + c1H) ≥ 0. If h1 > h2, then for each value of c2, the expression T(c1, c2) is divisible by qh2 and by no higher powers of q. So we subtract the quantity h2 ln q from for all c2. Finally, if h1h2, then we subtract h1 ln q from for all c2 and for h > h1 solve the congruence as .

Once the above procedure is carried out for each small prime q in the factor base and for each small exponent h, we check for which values of c2, the value is equal (that is, sufficiently close) to 0. These are precisely the values of c2 such that for the given c1 the integer T(c1, c2) factors smoothly over the small primes in the factor base.

As in the QSM for integer factorization, it is sufficient to have some approximate representations of the logarithms (like ln q). Incomplete sieving and large prime variation can also be adopted as in the QSM.

Finally, we change c1 and repeat the sieving process described above. It is easy to see that the sieving operations for all c1 in the range –Mc1M take time L[1] as announced earlier. Gaussian elimination involving sparse congruences in L[1/2] variables also meets the same running time bound.

The second stage of the LSM can be performed in L[1/2] time. Using a method similar to the second stage of the basic ICM leads to a huge running time (L[3/2]), because we have only L[1/2] small primes in the factor base. We instead do the following. We start with a random j and try to obtain a factorization of the form , where q runs over L[1/2] small primes in the factor base and u runs over medium-sized primes, that is, primes less than L[2]. One can use an integer factorization algorithm to this effect. Lenstra’s ECM is, in particular, recommended, since it can detect smooth integers fast. More specifically, about L[1/4] random values of j need to be tried, before we expect to get an integer with the desired factorization. Each attempt of factorization using the ECM takes time less than L[1/4].

Now, we have . The indices indg q are available from the first stage, whereas for each u (with wu ≠ 0) the index indg u is calculated as follows. First we sieve in an interval of size L[1/2] around and collect integers y in this interval, which are smooth with respect to the L[1/2] primes in the factor base. A second sieve in an interval of size L[1/2] around H gives us a small integer c, such that (H + c)yup is smooth again with respect to the L[1/2] primes in the factor base. Since H + c is in the factor base, we get indg u. The reader can easily verify that computing individual logarithms indg a using this method takes time L[1/2] as claimed earlier.

There are some other L[1] methods (like the Gaussian integer method and the residue list sieve method) known for computing discrete logarithms in prime fields. We will not discuss these methods in this book. Interested readers may refer to Coppersmith et al. [59] to know about these L[1] methods. A faster method (running time L[0.816]), namely the cubic sieve method, is covered in Exercise 4.21. Now, we turn our attention to the best method known till date.

** The number field sieve method

The number field sieve method (NFSM) for solving the DLP in a prime field is a direct adaptation of the NFSM used to factor integers (Section 4.3.4). As before, we let g be a generator of and are interested in computing the index d = indg a for some .

We choose an irreducible polynomial with small integer coefficients and of degree d, and use the number field for some root of f. For the sake of simplicity, we consider the special case (SNFSM) that f is monic, is a PID, and . We also choose an integer m such that f(m) ≡ 0 (mod p) and define the ring homomorphism

Finally, we predetermine a bound and let be the set of (rational) primes , the set of prime ideals of of prime norms , a set of generators of the (principal) ideals and a set of generators of the group of units of .

We try to find coprime integers c, d of small absolute values such that both c + dα and Φ(c + dα) = c + dm are smooth with respect to and respectively, that is, we have factorizations of the forms and or equivalently, . But then , that is,

Equation 4.8


This motivates us to define the factor base as

We assume that so that we have the free relation indg g ≡ 1 (mod p – 1).

Trying sufficiently many pairs (c, d) we generate many Relations (4.8). The resulting sparse linear system is solved for the unknown indices of the elements of B. This completes the first stage of the SNFSM.

In the second stage, we bring a to the scene in the following manner. First assume that a is small such that either a is -smooth, that is,

or for some the ideal can be written as a product of prime ideals of , that is,

or, equivalently,

In both the cases, taking logarithms and substituting the indices of the elements of the factor base (available from the first stage) yields d = indg a.

However, a is not small, in general, and it is a non-trivial task to find a such that 〈γ〉 is -smooth. We instead write a as a product

Equation 4.9


where each ai is small enough so that indg ai can be computed using the method described above. This gives . In order to see how one can find a representation of a as a product of small integers as in Congruence (4.9), we refer the reader to Weber [300].

As in most variants of the ICM, the running time of the SNFSM is dominated by the first stage and under certain heuristic assumptions can be shown to be of the order of L(p, 1/3, (32/9)1/3). Look at Section 4.3.4 to see how the different parameters can be set in order to achieve this running time. For the general NFS method (GNFSM), the running time is L(p, 1/3, (64/9)1/3). The GNFSM has been implemented by Weber and Denny [301] for computing discrete logarithms modulo a particular prime having 129 decimal digits (see McCurley [189]).

4.4.4. Algorithms for Fields of Characteristic 2

We wish to compute the discrete logarithm indg a of an element , q = 2n, with respect to a primitive element g of . We work with the representation for some irreducible polynomial with deg f = n. For certain algorithms, we require f to be of special forms. This does not create enough difficulties, since it is easy to compute isomorphisms between two polynomial basis representations of (Exercise 3.38).

Recall that we have defined the smoothness of an integer x in terms of the magnitudes of the prime divisors of x. Now, we deal with polynomials (over ) and extend the definition of smoothness in the obvious way: that is, a polynomial is called smooth if it factors into irreducible polynomials of low degrees. The next theorem is an analog of Theorem 2.21 for polynomials. By an abuse of notation, we use ψ(·, ·) here also. The context should make it clear what we are talking about – smoothness of integers or of polynomials.

Theorem 4.1.

Let r, , r1/100mr99/100, and let u := r/m. Then the number of polynomials , deg f = r, such that all irreducible factors of f have degrees ≤ m, equals 2ruu+o(u) = 2re–[(1+o(1))u ln u] as u → ∞. In particular, the probability that the degrees of all irreducible factors of a randomly chosen polynomial in of degree r are ≤ m is asymptotically equal to

ψ(r, m) := uu+o(u) = e–[(1+o(1))u ln u].

The above expression for ψ(r, m), though valid asymptotically, gives good approximations for finite values of r and m. The condition r1/100mr99/100 is met in most practical situations. The probability ψ(r, m) is a very sensitive function of u = r/m. For a fixed m, polynomials of smaller degrees have higher chances of being smooth (that is, of having all irreducible factors of degrees ≤ m).

Now, let us consider the field with q = 2n. The elements of are represented as polynomials of degrees ≤ n–1. For a given m, the probability that a randomly chosen element of has all irreducible factors of degrees ≤ m is then approximately given by , as n, m → ∞ with n1/100mn99/100. We can, therefore, approximate by ψ(n, m).

For many algorithms that we will come across shortly, we have rn/α and for some positive α and β, so that and, consequently,

The basic ICM

The idea of the basic ICM for is analogous to that for prime fields. Now, the factor base B comprises all irreducible polynomials of having degrees ≤ m. We choose . (As in the case of the basic ICM for prime fields, this can be shown to be the optimal choice.) By Approximation (2.5) on p 84, we then have .

In the first stage, we choose random α, 1 ≤ α ≤ q – 2, compute gα and check if gα is B-smooth. If so, we get a relation. For a random α, the polynomial gα is a random polynomial of degree < n and hence has a probability of nearly of being smooth. Note that unlike integers a polynomial over can be factored in probabilistic polynomial time (though for small m it may be preferable to do trial division by elements of B). Thus checking the smoothness of a random element of can be done in (probabilistic) polynomial time, and each relation is available in expected time . Since we need (slightly more than) relations for setting up the linear system, the relation collection stage runs in expected time . A sparse system with unknowns can also be solved in time .

In the second stage, we need a single smooth polynomial of the form gαa. If α is randomly chosen, we expect to get this relation in time . Therefore, the second stage is again faster than the first and the basic method takes a total expected running time of . Recall that the basic method for requires time L[2]. The difference arises because polynomial factorization is much easier than integer factorization.

We now explain a modification of the basic method, proposed by Blake et al. [23]. Let : that is, a non-zero polynomial in of degree < n. If h is randomly chosen from (as in the case of gα or gαa for random α), then we expect the degree of h to be close to n. Let us write hh1/h2 (mod f) (f being the defining polynomial) with h1 and h2 each having degree ≈ n/2. Then the ratio of the probability that both h1 and h2 are smooth to the probability that h is smooth is ψ(n/2, m)2/ψ(n, m) ≈ 2n/m (neglecting the o( ) terms). For practical values of n and m, this ratio of probabilities can be substantially large implying that it is easier to get relations by trying to factor both h1 and h2 instead of trying to factor h. This is the key observation behind the modification due to Blake et al. [23]. Simple calculations show that this modification does not affect the asymptotic behaviour of the basic method, but it leads to considerable speed-up in practice.

In order to complete the description of the modification of Blake et al. [23], we mention an efficient way to write h as h1/h2 (mod f). Since 0 ≤ deg h < n and since f is irreducible of degree n, we must have gcd(h, f) = 1. During the iteration of the extended gcd algorithm we actually compute a sequence of polynomials uk, vk, xk such that ukh + vkf = xk for all k = 0, 1, 2, . . . . At the start of the algorithm we have u0 = 1, v0 = 0 and x0 = h. As the algorithm proceeds, the sequence deg uk changes non-decreasingly, whereas the sequence deg xk changes non-increasingly and at the end of the extended gcd algorithm we have xk = 1 and the desired Bézout relation ukh + vkf = 1 with deg ukn – 1. Instead of proceeding till the end of the gcd loop, we stop at the value k = k′ for which deg xk is closest to n/2. We will then usually have deg ukn/2, so that taking h1 = xk and h2 = uk serves our purpose.

The concept of large prime variation is applicable for the basic ICM. Moreover, if trial divisions are used for smoothness tests, one can employ the early abort strategy. Despite all these modifications the basic variant continues to be rather slow. Our hunt for faster algorithms continues.

The adaptation of the linear sieve method

The LSM for prime fields can be readily adapted to the fields , q = 2n. Let us assume that the defining polynomial f is of the special form f(X) = Xn + f1(X), where deg f1 is small. The total number of choices for such f with deg f1 < k is 2k. Under the assumption that irreducible polynomials (over ) of degree n are randomly distributed among the set of polynomials of degree n, we expect to find an irreducible polynomial f = Xn + f1 for deg f1 = O(lg n) (see Approximation (2.5) on p 84). In particular, we may assume that deg f1n/2.

Let k := ⌈n/2⌉ and . For polynomials h1, of small degrees, we then have

(Xk + h1)(Xk + h2) ≡ Xσf1 + (h1 + h2)Xk + h1h2 (mod f).

The right side of the congruence, namely,

T(h1, h2) := Xσf1 + (h1 + h2)Xk + h1h2,

has degree slightly larger than n/2. This motivates the following algorithm.

We take and let the factor base B be the (disjoint) union of B1 and B2, where B1 contains irreducible polynomials of degrees ≤ m, and where B2 contains polynomials of the form Xk + h, deg hm. Both B1 and B2 (and hence B) contain L[1/2] elements. For each Xk + h1, , we then check the smoothness of T(h1, h2) over B1. Since deg T(h1, h2) ≈ n/2, the probability of finding a smooth candidate per trial is L[–1/2]. Therefore, trying L[1] values of the pair (h1, h2) is expected to give L[1/2] relations (in L[1/2] variables). Since factoring each T(h1, h2) can be performed in probabilistic polynomial time, the relation collection stage takes time L[1]. Gaussian elimination (with sparse congruences) can be done in the same time. As in the case of the LSM for prime fields, the second stage can be carried out in time L[1/2]. To sum up, the LSM for fields of characteristic 2 takes L[1] running time.

Note that the running time L[1] is achievable in this case without employing any sieving techniques. This is again because checking the smoothness of each T(h1, h2) can simply be performed in polynomial time. Application of polynomial sieving, though unable to improve upon the L[1] running time, often speeds up the method in practice. We will describe such a sieving procedure in connection with Coppersmith’s algorithm that we describe next.

Coppersmith’s algorithm

Coppersmith’s algorithm is the fastest algorithm known to compute discrete logarithms in finite fields of characteristic 2. Theoretically it achieves the (heuristic) running time L(q, 1/3, c) and is, therefore, subexponentially faster than the L[c′] = L(q, 1/2, c′) algorithms described so far. Gordon and McCurley have made aggressive attempts to compute discrete logarithms in fields as large as using Coppersmith’s algorithm in tandem with a polynomial sieving procedure and, thereby, established the practicality of the algorithm.

In the basic method, each trial during the search for relations involves checking the smoothness of a polynomial of degree nearly n. The modification due to Blake et al. [23] replaces this by checking the smoothness of two polynomials of degree ≈ n/2. For the adaptation of the LSM, on the other hand, we check the smoothness of a single polynomial of degree ≈ n/2. In Coppersmith’s algorithm, each trial consists of checking the smoothness of two polynomials of degrees ≈ n2/3. This is the basic reason behind the improved performance of Coppersmith’s algorithm.

To start with we make the assumption that the defining polynomial f of is of the form f(X) = Xn + f1(X) with deg f1 = O(lg n). We have argued earlier that an irreducible polynomial f of this special form is expected to be available. We now choose three integers m, M, k such that

m ≈ αn1/3(ln n)2/3, M ≈ βn1/3(ln n)2/3 and 2k ≈ γn1/3(ln n)–1/3,

where the (positive real) constants α, β and γ are to be chosen appropriately to optimize the running time. The factor base B comprises irreducible polynomials (over ) of degrees ≤ m. Let

l := ⌊n/2k⌋ + 1,

so that l ≈ (1/γ)n2/3(ln n)1/3. Choose relatively prime polynomials u1(X) and u2(X) (in ) of degrees ≤ M and let

h1(X) := u1(X)Xl + u2(X) and h2(X) := (h1(X))2k rem f(X).

But then, since indg h2 ≡ 2k indg h1 (mod q – 1), we get a relation if both h1 and h2 are smooth over B. By choice, deg h1 is clearly O~(n2/3), whereas

h2(X)≡ u1(X2k)Xl2k + u2(X2k)≡ u1(X2k)Xl2knf1(X) + u2(X2k)(mod f)

and, therefore, deg h2 = O~(n2/3) too.

For each pair (u1, u2) of relatively prime polynomials of degrees ≤ M, we compute h1 and h2 as above and collect all the relations corresponding to the smooth values of both h1 and h2. This gives us the desired (sparse) system of linear congruences in the unknown indices of the elements of B, which is subsequently solved modulo q – 1.

The choice and γ = α–1/2 gives the optimal running time of the first stage as

e[(2α ln 2)+o(1))n1/3(ln n)2/3] = L(q, 1/3, 2α/(ln 2)1/3) ≈ L(q, 1/3, 1.526).

The second stage of Coppersmith’s algorithm is somewhat involved. The factor base now contains only nearly L(q, 1/3, 0.763) elements. Therefore, finding a relation using a method similar to the second stage of the basic method requires time L(q, 2/3, c) for some c, which is much worse than even L[c′] = L(q, 1/2, c′). To work around this difficulty we start by finding a polynomial gαa all of whose irreducible factors have degrees ≤ n2/3(ln n)1/3. This takes time of the order of L(q, 1/3, c1) (where c1 ≈ 0.377) and gives us , where vi have degrees ≤ n2/3(ln n)1/3. Note that the number of vi is less than n, since deg(gαa) < n. We then have

All these vi need not belong to the factor base, so we cannot simply substitute the values of indg vi. We instead reduce the problem of computing each indg vi to the problem of computing indg vii for several i′ with deg vii ≤ σ deg vi for some constant 0 < σ < 1. Subsequently, computing each indg vii is reduced to computing indg viii for several i″ with deg viii < σ deg vii. Repeating this process, we eventually end up with the polynomials in the factor base. Because reduction of a polynomial generates new polynomials with degrees reduced by at least the constant factor σ, it is clear that the recursion depth is O(ln n). Now, if for each i the number of i′ is ≤ n and for each i′ the number of i″ is ≤ n and so on, we have to carry out the reduction of ≤ nO(ln n) = eO((ln n)2) = L(q, 1/3, 0) polynomials. Therefore, if each reduction can be performed in time L(q, 1/3, c2), the second stage will run in time L(q, 1/3, max(c1, c2)).

In order to explain how a polynomial v of degree ≤ dn2/3(ln n)1/3 can be reduced in the desired time, we choose such that , and let l := ⌊n/2k⌋ + 1. As in the first stage, we fix a suitable bound M, choose relatively prime polynomials u1(X), u2(X) of degrees ≤ M and define

h1(X) := u1(X)Xl + u2(X)

and

h2(X) := (h1(X))2k rem f(X) = u1(X2k)Xl2knf1(X) + u2(X2k).

The polynomials u1 and u2 should be so chosen that v|h1. We see that h1 and h2 have low degrees and we try to factor h1/v and h2. Once we get a factorization of the form

with deg vi, deg wj < σ deg v, we have the desired reduction of v, namely,

that is, the reduction of computation of indg v to that of all indg vi and indg wj. With the choice M ≈ (n1/3(ln n)2/3(ln 2)–1 + deg v)/2 and σ = 0.9, reduction of each polynomial can be shown to run in time L(q, 1/3, (ln 2)–1/3) ≈ L(q, 1/3, 1.130). Thus the second stage of Coppersmith’s algorithm runs in time L(q, 1/3, 1.130) and is faster than the first stage.

Large prime variation is a useful strategy to speed up Coppersmith’s algorithm. In case of trial divisions for smoothness tests, early abort strategy can also be applied. However, a more efficient idea (though seemingly non-collaborative with the early abort strategy) is to use polynomial sieving as introduced by Gordon and McCurley.

Recall that in the first stage we take relatively prime polynomials u1 and u2 of degrees ≤ M and check the smoothness of both h1(X) = u1(X)Xl + u2(X) and h2(X) = h1(X)2k rem f(X). We now explain the (incomplete) sieving technique for filtering out the (non-)smooth values of h1 = (h1)u1,u2 for the different values of u1 and u2. To start with we fix u1 and let u2 vary. We need an array indexed by u2, a polynomial of degree ≤ M. Clearly, u2 can assume 2M+1 values and so must contain 2M+1 elements. To be very concrete we will denote by the location , where u2(2) ≥ 0 is the integer obtained canonically by substituting 2 for X in u2(X) considered to be a polynomial in with coefficients 0 and 1. We initialize all the locations of to zero.

Let t = t(X) be a small irreducible polynomial in the factor base B (or a small power of such an irreducible polynomial) with δ := deg t. The values of u2 for which t divides (h1)u1,u2 satisfy the polynomial congruence u2(X) ≡ u1(X)Xl (mod t). Let be the solution of this congruence with . If δ* > M, then no value of u2 corresponds to smooth (h1)u1,u2. So assume that δ*M. If δ > M, then the only value of u2 for which (h1)u1,u2 is smooth is . So we may also assume that δ ≤ M. Then the values of u2 that makes (h1)u1,u2 smooth are given by for all polynomials v(X) of degrees ≤ M – δ. For each of these 2M–δ+1 values of u2, we add δ = deg t to the location .

When the process mentioned in the last paragraph is completed for all , we find out for which values of u2 the array locations contain values close to deg(h1)u1,u2. These values of u2 correspond to the smooth values of (h1)u1,u2 for the chosen u1. Finally, we vary u1 and repeat the sieving procedure again.

In each sieving process described above, we have to find out all the values as v runs through all polynomials of degrees ≤ M – δ. We may choose the different possibilities for v in any sequence, compute the products vt and then add these products to . While doing so serves our purpose, it is not very efficient, because computing each u2 involves performing a polynomial multiplication vt. Gordon and McCurley’s trick steps through all the possibilities of v in a clever sequence that helps one get each value of u2 from the previous one by a much reduced effort (compared to polynomial multiplication). The 2M–δ+1 choices of v can be naturally mapped to the bit strings of length (exactly) M – δ + 1 (with the coefficients of lower powers of X appearing later in the sequence). This motivates using the following concept.

Definition 4.2.

Let . Then the (binary) gray code of dimension d is a sequence of all (that is, 2d) bit strings of length d defined inductively as follows. For d = 1, we define and , whereas for d > 1 we define

where juxtaposition denotes string concatenation.

For example, the gray code of dimension 2 is 00, 01, 11, 10 and that of dimension 3 is 000, 001, 011, 010, 110, 111, 101, 100. Proposition 4.1 can be easily proved by induction on the dimension d.

Proposition 4.1.

Let and let be the gray code of dimension d. For any i, 1 ≤ i < 2d, the bit strings and differ in exactly one bit position b(i). This position is given by b(i) = v2(i), where v2(i) denotes the multiplicity of 2 in i.

Back to our sieving business! Let us agree to step through the values of v in the sequence v1, v2, . . . , v2M – δ+1, where vi corresponds to the bit string for the (M – δ + 1)-dimensional gray code. Let us also call the corresponding values of u2 as . Now, v1 is 0 and the corresponding is available at the beginning. By Proposition 4.1 we have for 1 ≤ i < 2M–δ+1 the equality vi+1 = vi + Xv2(i), so that (u2)i+1 = (u2)i + Xv2(i)t. Computing the product Xv2(i)t involves shifting the coefficients of t and is done efficiently using bit operations only (assuming data structures introduced in Section 3.5). Thus (u2)i+1 is obtained from (u2)i by a shift followed by a polynomial addition. This is much faster than computing (u2)i+1 directly as .

We mentioned earlier that efficient implementations of Coppersmith’s algorithm allows one to compute, in feasible time, discrete logarithms in fields as large as . However, for much larger fields, say for n ≥ 1024, this algorithm is still not a practical breakthrough. The intractability of the DLP continues to remain cryptographically exploitable.

Exercise Set 4.4

4.15

Binary search Let ≤ be a total order on a set S (finite or infinite) and let a1a2 ≤ ··· ≤ am be a given sequence of elements of S. Device an algorithm that, given an arbitrary element , determines using only O(lg m) comparisons in S whether a = ai for some i = 1, . . . , m and, if so, returns i. [H]

4.16
  1. Show that any map can be represented uniquely as a polynomial of degree < q. [H]

  2. The set S of all maps is a ring under point-wise addition and multiplication. Prove the ring isomorphism .

4.17Let p be a prime and g a primitive element of . For a , prove the explicit formula (mod p). What is the problem in using this formula for computing indices in ?
4.18In the basic ICM for the prime field , we try to factor random powers gα over the factor base B = {q1, . . . , qt}. In addition to the canonical representative of gα in the set {1, . . . , p – 1}, one can also check for the smoothness of the integers gα + kp for –MkM, where M is a small positive integer (to be determined experimentally).
  1. Let ρk,i := (gα + kp) rem qi for i = 1, . . . , t and for –MkM. How can one compute these remainders ρk,i efficiently? Device an algorithm that checks the smoothness of all gα + kp using the values ρk,i. [H]

  2. Device an algorithm that uses a sieve over the interval –MkM.

  3. Explain how the above two strategies can be modified to work for the field .

4.19
  1. Show that for the LSM over the average and the maximum Tmax of |T(c1, c2)| over all values of c1, c2 (that is, for –Mc1c2M) are approximately HM and 2HM, respectively. [H]

  2. For real 0 ≤ η ≤ 1, let , |T(c1, c2)| ≤ ηTmax} and let . Show that t(η) ≈ η(2 – η). (This shows that the distribution of T(c1, c2) is not really random.)

4.20Consider the following modification of the LSM for . Define for the integers and . Choose a small and repeat the linear sieve method for each r, 1 ≤ rs, that is, check the smoothness (over the first t = L[1] primes) of the integers Tr(c1, c2) := Jr + (c1 + c2)Hr + c1c2 for all 1 ≤ rs, –μc1c2μ. Let be the average of |Tr(c1, c2)| over all choices of r, c1 and c2. Show that , where is as defined in Exercise 4.19. In particular, for both the choices: (1) and (2) μ = ⌊M/s⌋, that is, on an average we check smaller integers for smoothness under this modified strategy. Determine the size of the factor base and the total number of integers Tr(c1, c2) checked for smoothness for the two values of μ given above.
4.21

Cubic sieve method (CSM) for Let the integers x, y, z satisfy x3y2z (mod p) with x3y2z. Assume that each of x, y, z is O(pξ).

  1. Show that for integers a, b, c with a + b + c = 0 one has

    (x + ay)(x + by)(x + cy) ≡ y2T(a, b, c) (mod p),

    where T(a, b, c) := z + (ab + ac + bc)x + (abc)y = –b(b + c)(x + cy) + (zc2x). Since x, y, z are O(pξ), we have T(a, b, c) = O(pξ) for small values of a, b, c.

  2. For the CSM, the factor base B comprises all primes q1, . . . , qt with together with the integers x + ay, –MaM, . If T(a, b, c) factors completely over q1, . . . , qt, we get a relation. Show that if we check the smoothness of T(a, b, c) for all –MabcM with a + b + c = 0, we expect to get enough relations to compute the discrete logarithms of elements of B.

  3. In order to carry out sieving, fix c and let b vary. Specify the details of the sieving process. [H]

  4. Specify an algorithm for the second stage of the CSM. [H]

  5. Show that the expected running time of the CSM is . Therefore, if ξ < 1/2, the CSM is asymptotically faster than the LSM method, since the LSM runs in time L[1]. The best possible value ξ = 1/3 corresponds to a running time of the CSM.

4.22The problem with the CSM is that it is not known how to efficiently compute a solution of the congruence

Equation 4.10


subject to the condition that x3y2z and x, y, z = O(pξ) for 1/3 ≤ ξ < 1/2. In this exercise, we estimate the number of solutions of Congruence (4.10).

  1. Show that the total number of solutions of Congruence (4.10) modulo p with x, y, is (p – 1)2 which is Θ(p2).

  2. Show that the total number of solutions of Congruence (4.10) modulo p with x, y, and x3y2z is also Θ(p2).

  3. Under the heuristic assumption that the solutions (x, y, z) of Congruence (4.10) are randomly distributed in , deduce that the expected number of solutions of Congruence (4.10) modulo p with x, y, , x3y2z, and 1 ≤ x, y, zpξ, 1/3 ≤ ξ ≤ 1, is nearly p3ξ–1. (Therefore, if ξ is slightly larger than 1/3, we expect to get a solution. It is not known how to compute such a solution in polynomial (or even subexponential) time. However, for certain values of p a solution is naturally available, for example, if p (or a small multiple of p) is close to an integer cube.)

4.23

Adaptation of CSM for Let be represented as , where the defining polynomial f is of the form f(X) = Xn + f1(X) with deg f1n/3. Let k := ⌈n/3⌉. Show that for polynomials h1, of small degrees (Xk + h1(X))(Xk + h2(X))(Xk + h1(X) + h2(X)) rem f(X) is of degree slightly larger than n/3. Device an ICM for solving the DLP in based on this observation. What is the best running time of this method? [H]

*4.5. The Elliptic Curve Discrete Logarithm Problem (ECDLP)

Unlike the finite field DLP, there are no general-purpose subexponential algorithms to solve the ECDLP. Though good algorithms are known for certain specific types of elliptic curves, all known algorithms that apply to general curves take fully exponential time. The square root methods of Section 4.4 are the fastest known methods for solving the ECDLP over an arbitrary curve. As a result, elliptic curves are gaining popularity for building cryptosystems. The absence of subexponential algorithms implies that smaller fields can be chosen compared to those needed for cryptosystems based on the (finite field) DLP. This, in particular, results in smaller sizes of keys.

We start with Menezes, Okamoto and Vanstone’s (MOV) algorithm that reduces the ECDLP in a curve over to the DLP over the field for some suitable . Since, the DLP can be solved in subexponential time, the ECDLP is also solved in that time, provided that the extension degree is small. For supersingular curves, one can choose k ≤ 6. For non-supersingular curves, this k is quite large, in general, and the MOV reduction takes exponential time.

A linear-time algorithm is known to solve the ECDLP over anomalous curves (that is, curves with trace of Frobenius equal to 1). This algorithm is called the SmartASS method after its inventors Smart, Araki, Satoh and Semaev [257, 265, 282].

J. H. Silverman [277] has proposed an algorithm known as the xedni calculus method for solving the ECDLP over an arbitrary curve. Rigorous running times of this algorithm are not known, however heuristic analysis and experiments suggest that this algorithm is not really practical.

Let E be an elliptic curve over a finite field and let be of order m. We want to compute indP Q (if it exists) for a point . Unless it is necessary, we will not assume any specific defining equation for E or a specific value of q.

**4.5.1. The MOV Reduction

Let us first look at the structure of the group of m-torsion points on an elliptic curve defined over K. Here is the algebraic closure of K.

Theorem 4.2.

Let K be a field of characteristic , and E an elliptic curve defined over K. We consider two separate cases:[5]

[5] For the MOV reduction, only the first case is important.

  1. If p = 0 or if p > 0 does not divide m, then . In particular, in this case.

  2. If p > 0, then either for all or for all .

Now, let E be an elliptic curve defined over a finite field K of characteristic p. Let with gcd(m, p) = 1. We use the shorthand notation E[m] for (and not for EK[m]). We want to define a function

em : E[m] × E[m] → μm,

where is the group of m-th roots of unity (Exercise 4.24). This function em, known as the Weil pairing, helps us reduce the ECDLP in to the DLP in a suitable field . Let P, . The definition of em(P, R) calls for using divisors on E. Recall from Exercise 2.125 that a divisor belongs to (that is, is the divisor of a rational function on E) if and only if and . Since , there is a rational function such that . Now, as well and pm2. Hence, by Theorem 4.2 there exists a point R′ of order m2 such that R = mR′. Since, #E[m] = m2, it follows that and, therefore, there exists a rational function with . The functions f and g as introduced above are unique up to multiplication by elements of . One can show that we can choose f and g in such a manner that f ο λm = gm, where is the multiplication map QmQ. Then for and we have gm(P + U) = f(mP + mU) = f(mU) = gm(U). Since g has only finitely many poles and zeros (whereas is infinite), we can choose U such that both g(U) and g(P + U) are defined and non-zero. For such a point U, we then have and define

em(P, R) := g(P + U)/g(U).

The right side can be shown to be independent of the choice of U. The relevant properties of the Weil pairing em are now listed.

Proposition 4.2.

Let P, P′, R, and a, Then we have:

Identityem(P, P)= 1.
Alternationem(P, R)= em(R, P)–1.
Bilinearityem(P + P′, R)= em(P, R)em(P′, R),
 em(P, R + R′)= em(P, R)em(P, R′),
 em(aP, bR)= (em(P, R))ab.
Non-degeneracyem(P, )= 1.
 If em(P, T) = 1 for all , then .

The above definition of em is not computationally effective. We will see later how we can compute em(P, T) in probabilistic polynomial time using an alternative (but equivalent) definition.

Algorithm 4.7 shows how the MOV reduction algorithm makes use of Weil pairing. We now clarify the subtle details of this algorithm.

Algorithm 4.7. MOV reduction

Input: A point of order m, gcd(m, q) = 1, and a multiple Q of P.

Output: The index indP Q, that is, an integer l with Q = lP.

Steps:

Choose the smallest  such that .
while (1) {
   Choose a random point .
   α := em(PR),   β := em(QR).  /* α, 
   l := indα β.   /* Discrete logarithm in  */
   if (Q = lP) { Return l. }
}

The correctness of the algorithm

From the bilinearity of the Weil pairing, it follows that if Q = lP, 0 ≤ l < m, then β = em(Q, R) = em(lP, R) = em(P, R)l = αl. Thus treating indα β as the least nonnegative integer modulo ord α we conclude that l = indα β if and only if ord α = m, that is, α is a primitive m-th root of unity. That α is an m-th root of unity for any is obvious from the definition of em. We now show that there exists some for which α = em(P, R) is primitive.

Lemma 4.1.

Let be of order m (so that P generates the subgroup 〈P〉 of order m in E[m]). Then for any R1, the cosets R1 + 〈P〉 and R2 + 〈P〉 are equal if and only if em(P, R1) = em(P, R2).

Proof

If R1 + 〈P〉 = R2 + 〈P〉, then R1 = R2 + rP for some integer r and so by bilinearity and identity of Weil pairing em(P, R1) = em(P, R2)em(P, P)r = em(P, R2).

Conversely, let em(P, R1) = em(P, R2). By Theorem 4.2, is generated by two elements of order m. We can take one of these elements to be P, let P′ be the other element and write R1R2 = aP + aP′ for some a, . Then em(P, R1) = em(P, R2 + aP + aP′) = em(P, R2)em(P, P)aem(P, aP′), whence it follows that em(P, aP′) = 1. Finally, for an arbitrary , b, , we have em(aP′, T) = em(aP′, bP + bP′) = em(aP′, P)bem(P′, P′)ab = em(P, aP′)b = 1. By the non-degeneracy property of em, it then follows that , that is, .

As an immediate corollary to Lemma 4.1, the desired result follows.

Proposition 4.3.

Let be of order m and let

Then #S/#E[m] = φ(m)/m. In particular, S is non-empty.

Proof

There are m distinct cosets of 〈P〉 in E[m]. Now, as R ranges over all points of E[m], the coset R+〈P〉 ranges over all of these m possibilities and, accordingly by Lemma 4.1 the value em(P, R) ranges over m distinct values. Since μm is cyclic of order m and hence with φ(m) generators, the theorem follows.

By Theorem 3.1, one should try an expected number of O(ln ln m) random points before a primitive m-th root α = em(P, R) is found.

Choosing k

Since E[m] consists of finitely many (m2) points, it is obvious that there exist finite values of k such that . It can also be shown that if , then that is, for all P, . The computation of the discrete logarithm indα β is then carried out in . For Algorithm 4.7 to be efficient, one requires k to be rather small. However, for most curves, k is rather large implying that the MOV reduction is impractical for these curves. For the specific class of curves, the so-called supersingular curves, one can choose k to be rather small, namely k ≤ 6. We don’t go to the details of the choices for k for various cases of supersingular curves, but refer the reader to Menezes [192].

Computing em(P, R)

We start with an alternative definition of the Weil pairing for P, . First note that if is a divisor and if is a rational function on E such that for every pole or zero T of f one has mT = 0 (that is, such that Div(f) and T have disjoint supports), then one can define

Choose points U, (where ) and consider the divisors DP := [P + U] – [U] and DR := [R + V] – [V]. Since ) is infinite, one can choose both P + U and U distinct from R + V and V. Since P, , it follows that mDP and mDR are principal, namely, there are rational functions fP and fR such that Div(fP) = mDP = m[P + U] – m[U] and Div(fR) = mDR = m[R + V] – m[V]. One can show that

Equation 4.11


independent of the choice of U and V as long as fP (DR) and fR(DP) are defined. Therefore, em(P, R) can be computed efficiently, if fP and fR can be computed efficiently. To this effect we now describe an algorithm for computing the rational function f of a principal divisor , where . Since deg , we can write . Suppose that we have an Algorithm A that, for a pair of reduced divisors

and

computes the sum (a reduced divisor)

Then, f can be computed by repeated application of Algorithm A as follows.

  1. Compute for each i = 1, . . . , r the reduced divisor . Let 1 = ai1, ai2, . . . , aiti = |mi| be an addition chain for |mi| (Exercise 3.18). Clearly, ti – 1 applications of Algorithm A computes Δi. Since we can choose ti ≤ 2 ⌈lg |mi|⌉, each Δi can be computed using O(log |mi|) applications of Algorithm A.

  2. Compute f by computing D = Div(f) = Δ1 + ··· + Δr. This can be done by applying Algorithm A a total of r – 1 times.

What remains is the description of Algorithm A that computes P3 and f3 from a knowledge of P1, P2, f1 and f2. Clearly, if , then we have P3 = P2 and f3 = f1f2. Similar is the case for . So assume and . Let l1 be the line passing through P1 and P2 and P′ := –(P1 + P2). First, assume that . By Exercise 2.125, we have . Let l2 be the (vertical) line passing through P′ and –P′. Again by Exercise 2.125, we have . But then , that is, we take P3 = –P′ = P1+P2 and f3 = f1f2l1/l2. Finally, if , then and, therefore, . Thus, in this case too, we take and f3 = f1f2l1/l2 with l2 := 1.

Before we finish the description of the MOV reduction, some comments are in order. First note that if f1, and P1, , then both l1 and l2 are in K(E) and the computation of f3 and P3 can be carried out by working in K only.

Second, consider the (general) case . Since , the rational function f3 has poles and is, therefore, undefined only at the points P3 and . f3 is certainly defined at –P3, but l2(–P3) = 0 and, therefore, evaluating f3(–P3) as (f1f2l1)(–P3)/l2(–P3) fails. Of course, there is a rational function g such that both f1f2l1g and l2g are defined and non-zero at –P3, but finding such a rational function is an added headache. So we choose to continue to have the representation f3 = f1f2l1/l2 and agree not to evaluate f3 at –P3. Recall from Equation (4.11) that we want to evaluate fP at DR (that is, at R + V and V) and also fR and DP (that is, at P + U and U). Let us assume that we use the addition chain 1 = a1, a2, . . . , at = m for m. This means that we cannot evaluate fP at the points ±ai(P + U) and ±aiU for all i = 1, . . . , t. Therefore, V should be chosen such that both R + V and V are not one of these points. Similar constraints dictate the choice of U. However, if m is sufficiently large (m ≥ 1024) and if we choose an addition chain of length t ≤ 2 ⌈lg m⌉, then it can be easily seen that for a random choice of (U, V) the evaluation of fP (DR) or fR(DP) fails with a probability of no more than 1/2. Therefore, few random choices of (U, V) are expected to make the algorithm work. This is the only place where a probabilistic behaviour of the algorithm creeps in. In practice, however, this is not a serious problem, since we have much larger values of m (than 1024) and accordingly the above probability of failure becomes negligibly small.

Finally, note that if we multiply the factors f1, f2 and l1 in the numerator, then the coefficients of the numerator grow very rapidly, when the algorithm is applied repeatedly. Thus we prefer to keep the numerator in the factored form. The same applies to the denominator as well.

**4.5.2. The SmartASS Method

The SmartASS method, named after its inventors Smart [282], Satoh and Araki [257] and Semaev [265], is also called the anomalous attack to solve the ECDLP, since it is applicable to anomalous elliptic curves. Let be a finite field of odd prime cardinality p and E an elliptic curve defined over . We assume that E is anomalous: that is, the trace of Frobenius of E at p is 1; that is, . Since p is prime, the group is cyclic and, in particular, isomorphic to the additive group (, +). This isomorphism is effectively exploited by the SmartASS method to give a polynomial time algorithm for computing ECDLP in the group .

Before proceeding further we introduce some auxiliary results. Recall (Exercise 2.133) that a local PID is called a discrete valuation ring (DVR). Now, we see an equivalent definition of a DVR, that gives a justification to its name.

Definition 4.3.

A discrete valuation on a field K is a surjective group homomorphism

such that for every a, we have v(a + b) ≥ min(v(a), v(b)). We extend the definition of v to a map by setting v(0) = +∞. The set

is a ring called the valuation ring of v.

A DVR can be characterized as follows:

Proposition 4.4.

Let R be an integral domain and let K := Q(R) be the field of fractions of R. Then R is a DVR if and only if there exists a discrete valuation of K such that R is the valuation ring of v.

Proof

[if] By definition, . We have v(1) = v(1 · 1) = v(1) + v(1), so that v(1) = 0. If ab = 1 for some a, , then 0 = v(1) = v(ab) = v(a) + v(b). Since v(a), v(b) ≥ 0, it follows that v(a) = v(b) = 0. Conversely, let v(a) = 0 for some , a ≠ 0. Now, and we have 0 = v(1) = v(aa–1) = v(a) + v(a–1) = v(a–1): that is, . We conclude that is a unit if and only if v(a) = 0. Any proper ideal of R consists only of non-units and hence is contained in the set which is easily seen to be an ideal of R. Thus R is a local domain with maximal ideal .

Let and define . Clearly, each is an ideal of R. For an arbitrary non-zero ideal of R, consider . If i = 0, then contains a unit, that is, . So assume i > 0. Clearly, . Conversely, let , so that v(a) ≥ i. Choose with v(b) = i. But then iv(a) = v(ab–1) + v(b) = v(ab–1) + i: that is, v(ab–1) ≥ 0; that is, ; that is, . Thus, . In other words, , , are the only non-zero ideals of R. These ideals form the (infinite) descending chain .

By definition, is surjective. Let be such that v(x) = 1. The principal ideal 〈x〉 is not the unit ideal, satisfies and hence equals . One can likewise show that for all . Thus R is a PID. [only if] See Exercise 2.133.

Recall that the ring of p-adic integers (Definition 2.111) is a DVR. The field of fractions of is called the field of p-adic numbers. We now explicitly describe a valuation v on of which is the valuation ring. Let the p-adic expansion (Exercises 2.144 and 2.145) of a p-adic integer α be

Equation 4.12


A rational integer can be naturally viewed as a p-adic integer with finitely many nonzero terms, that is, one for which ki = 0 except for finitely many . However, a p-adic integer with infinitely many non-zero ki does not correspond to a rational integer. If in Expansion (4.12) we have k1 = k2 = ··· = kr–1 = 0, we can write

α = pr(kr + kr+1p + kr+2p2 + ···).

A p-adic integer is, in general, an infinite series and a representation with finite precision looks like

k0 + k1p + k2p2 + ··· + ksps + O(ps+1).

Arithmetic on p-adic numbers is done like integers written in base p, but from left to right. Thus, for example, if one wants to add two p-adic integers k0 + k1p + k2p2 + ... and , one may add the base-p integers ... k2k1k0 and in the usual manner till the desired level of precision. A p-adic integer α = k0 + k1p k2p2 + ··· is invertible (in ) if and only if k0 ≠ 0 (Proposition 2.52).

An element also has a p-adic expansion, but in this case one has to allow terms involving a finite number of negative exponents of p. That is to say, we have an expansion of the form

β = ktpt + kt+1pt+1 + ··· + k–1p–1 + k0 + k1p + k2p2 + ···

or

β = pt(kt + kt+1p + ··· + k–1pt–1 + k0pt + k1pt+1 + k2pt+2 + ···).

Of course, if kt = kt+1 = ··· = k–1 = 0, then β is already in .

From the arguments above, it follows that any non-zero can be written uniquely as γ = pδ0 + γ1p + γ2p2 + ···) with and γ0 ≠ 0. We then set v(γ) := δ. It is easy to see that v defines a discrete valuation on of which is the valuation ring. Moreover, since γ0 + γ1p + γ2p2 + ··· is a unit in , p = 0 + 1 · p + 0 · p2 + ··· plays the role of a uniformizer of the DVR . As usual, we write v(0) = +∞.

Now, back to our ECDLP business. Let E be an elliptic curve defined over . Here we consider the case that E is anomalous. We can naturally think of E as a curve over the field as well and denote this curve by ε. The coordinate-wise application of the canonical surjection induces the reduction homomorphism . Now, we define the following subgroups of :

It can be shown that is a subgroup of and is a subgroup of . Furthermore, since E is anomalous, we have

Now, let and Q a point in the subgroup of generated by P. Our purpose is to find an integer l such that Q = lP. Let , be such that and . It is not difficult to find such points and . For example, if P = (a, b), we can take , where b0 = b and b1, b2, . . . are successively obtained by Hensel lifting.

Since , the point and, therefore, . Now, if we take the so-called p-adic elliptic logarithm ψp on both sides, we get (mod p2), whence it follows that

provided that is invertible modulo p. The function ψp can be easily calculated. Therefore, this gives a very efficient probabilistic algorithm for computing discrete logarithms over anomalous elliptic curves. Here the most time-consuming step is the linear-time computation of the points p and p. For further details on the algorithm (like the computation of and from P and Q, and the definition of p-adic elliptic logarithms), see Blake et al. [24] and Silverman [275].

**4.5.3. The Xedni Calculus Method

Joseph Silverman’s xedni calculus method (XCM) is a recent algorithm for solving the ECDLP in an arbitrary elliptic curve over a finite field. The algorithm is based on some deep mathematical conjectures and heuristic ideas. However, its performance has been experimentally established to be poor. Here we give a sketchy description of the XCM. For simplicity, we concentrate on elliptic curves over prime fields only.

The basic idea of the XCM is to lift an elliptic curve E over to a curve ε over . In view of this, we start with a couple of important results regarding elliptic curves over (or, more generally, over a number field). See Silverman [275], for example, for the proofs.

Let ε be an elliptic curve defined over a number field K.

Theorem 4.3. Mordell–Weil theorem

The group ε(K) is finitely generated.

The group structure of ε(K) is made explicit by the next theorem. Note that the elements of ε(K) of finite order form a subgroup εtors(K) of ε(K), called the torsion subgroup of ε(K) (Exercise 4.26).

Theorem 4.4.

for some .

The non-negative integer ρ of Theorem 4.4 is called the rank of ε(K).

Now, let E be an elliptic curve defined over a prime field , and Q a multiple of P. Our task is to compute an integer such that Q = lP. We assume that E is defined by a suitable Weierstrass equation. We consider the projective coordinates of points on . Let n denote the cardinality of .

The basic idea of the XCM is to select r points , compute an elliptic curve ε defined over and points such that modulo p the curve ε reduces to E and the points S1, . . . , Sr to Rp,1, . . . , Rp,r. If the rank of ε is small, then the points S2, . . . , Sr are expected to be linearly dependent. Computing a non-trivial linear dependency among S2, . . . , Sr gives a linear dependency among Rp,1, . . . , Rp,r, which in turn yields indP Q with high probability. The details are now explained. For r points Li := [hi, ki, li], i = 1, . . . , r, we use the notation:

We start by fixing an integer r, 4 ≤ r ≤ 9. We then choose r random pairs (si, ti) of integers and compute the points

We now apply a change of coordinates of the form

Equation 4.13


so that the first four of the points Rp,i become Rp,1 = [1, 0, 0], Rp,2 = [0, 1, 0], Rp,3 = [0, 0, 1] and Rp,4 = [1, 1, 1]. This change of coordinates fails if some three of the four points Rp,1, Rp,2, Rp,3 and Rp,4 sum to . But in that case the desired index indP Q can be computed with high probability. If, for example, , then we have (s1 + s2 + s3)P = (t1 + t2 + t3)Q and, therefore, if gcd(t1 + t2 + t3, n) = 1, then indP Q ≡ (t1 + t2 + t3)–1(s1 + s2 + s3) (mod n). On the other hand, if gcd(t1 + t2 + t3, n) ≠ 1, we repeat with a different set of pairs (si, ti).

Henceforth, we assume that the change of coordinates, as given in Equation (4.13), is successful. This transforms the equation for E to a general cubic equation:

Cp : up,1X3 + up,2X2Y + up,3XY2 + up,4Y3 + up,5X2Z + up,6XY Z + up,7Y2Z + up,8XZ2 + up,9Y Z2 + up,10Z3 = 0.

Now, we carry out a step that heuristically ensures that the curve ε over (that we are going to construct) has a small rank. We choose a product M of small primes with pM, a cubic curve

CM : uM,1X3 + uM,2X2Y + uM,3XY2 + uM,4Y3 + uM,5X2Z + uM,6XYZ + uM,7Y2Z + uM,8XZ2 + uM,9Y Z2 + uM,10Z3 ≡ 0 (mod M)

over and points RM,1, . . . , RM,r on CM and with coordinates in . The first four points should be RM,1 = [1, 0, 0], RM,2 = [0, 1, 0], RM,3 = [0, 0, 1] and RM,4 = [1, 1, 1]. We have to ensure also that for every prime divisor q of M, the matrix B(RM,1, . . . , RM,r) has maximal rank modulo q. In practice, it is easier to choose the points RM,1, . . . , RM,r first and then compute a curve CM passing through these points by solving a set of linear equations in the coefficients uM,1, . . . , uM,10 of CM. The curve CM should be so chosen that it has the minimum possible number of solutions modulo M. This, in conjunction with some deep conjectures in the theory of elliptic curves, guarantees that the curve ε that we will construct shortly will have a rank less than the expected value.

We now combine the curves Cp and CM as follows. Using the Chinese remainder theorem, we compute integers such that (mod p) and (mod M) for each i = 1, . . . , 10. Similarly, we compute points R1, . . . , Rr with integer coefficients such that RiRp,i (mod p) and RiRM,i (mod M) for each i = 1, . . . , r, where congruence of points stands for coordinate-wise congruence. Here we have R1 = [1, 0, 0], R2 = [0, 1, 0], R3 = [0, 0, 1] and R4 = [1, 1, 1].

Clearly, the points R1, . . . , Rr are lifts of the points Rp,1, . . . , Rp,r respectively, whereas the cubic curve

over is a lift of E. However, , treated as a curve over , need not pass through the points R1, . . . , Rr. In order to ensure this last condition, we modify the coefficients of to the (small integer) coefficients u1, . . . , u10 by solving the system of linear equations

subject to the condition that (mod pM) for each i = 1, . . . , 10. The resulting cubic curve

C : u1X3 + u2X2Y + u3XY2 + u4Y3 + u5X2Z + u6XYZ + u7Y2Z + u8XZ2 + u9Y Z2 + u10Z3 = 0

over evidently continues to be a lift of E.

Now, we apply a change of coordinates in order to transfer to the standard Weierstrass equation

ε : Y2 + a1XY + a3Y = X3 + a2X2 + a4X + a6

with integer coefficients ai. This transformation changes the points R1, . . . , Rr to the points S1, . . . , Sr. One should also ensure that .

Finally, we check if S2, . . . , Sr are linearly dependent. If so, we determine a (non-trivial) relation with . This corresponds to the relation , where n1 := –(n2 + ··· + nr), that is, sP = tQ with s := n1s1 + ··· + nrsr and t := n1t1 + ··· + nrtr. If gcd(t, n) = 1, we have indP Qt–1s (mod n).

On the other hand, if S2, . . . , Sr are linearly independent or if gcd(t, n) > 1, then the lifted data fail to compute indP Q. In that case, we repeat the entire process by selecting new pairs (si, ti) and/or new points RM,1, . . . , RM,r.

This completes our description of the XCM. See Silverman [277] for further details. No rigorous or heuristic analysis of the running time of the XCM is available in the literature. Practical experience (reported in Jacobson et al. [139]) shows that the algorithm is rather impractical. The predominant cause for failure of a trial of the XCM is that the probability that the points S2, . . . , Sr are linearly dependent is amazingly low. Suitable choices of the curve CM help us to construct curves ε of low rank, but not low enough, in general, to render S2, . . . , Sr linearly dependent. Larger values of r are expected to increase the probability of success in each trial, but it is not clear how to handle the values r > 9. Nevertheless, the XCM is a radically new idea to solve the ECDLP. As Joseph Silverman [277] says, “some of the ideas may prove useful in future work on ECDLP”.

Exercise Set 4.5

4.24Let K be a field, and . Elements of μm are called the m-th roots of unity. Prove the following assertions.
  1. μm is a subgroup of (, ·).

  2. If char K = 0, then m = m. [H]

  3. If p := char K > 0, then m = m/pvp(m). [H]

  4. μm is cyclic. [H]

  5. The set is a subgroup of .

4.25We use the notations of the last exercise and assume that m = m, that is, either char K = 0 or p := char K > 0 is coprime to m. In this case, a generator of μm is called a primitive m-th root of unity. If is a primitive m-th root of unity and ωr = 1 for some , then evidently m|r. In particular, m is the smallest of the exponents such that ωr = 1. The (monic) polynomial

where the product runs over all primitive m-th roots of unity, is called the m-th cyclotomic polynomial (over K). Clearly, deg Φm(X) = φ(m) (where φ is Euler’s totient function).

  1. Show that . [H] Use the Möbius inversion formula to deduce that , where μ is the Möbius function. Conclude that .

  2. If m is a prime, show that Φm(X) = Xm–1 + ··· + X + 1.

  3. Let m ≠ 1 be odd and char K ≠ 2. Show that Φ2m(X) = Φm(–X). [H]

  4. Show that if , l is the (multiplicative) order of q modulo m and if ω is a primitive m-th root of unity, then [K(ω) : K] = l. [H] In particular, Φm is a product of φ(m)/l (distinct) irreducible polynomials each of degree l.

4.26
  1. Let G be an (additive) Abelian group (not necessarily finite). Show that the subset

    is a subgroup of G. Gtors is called the torsion subgroup of G and the elements of Gtors are called torsion elements of G. An element is a torsion element of G if and only if a is of finite order.

  2. Let ε be an elliptic curve defined over a number field K. Show that the torsion subgroup εtors(K) of ε(K) is finite. [H]

  3. Let ε and K be as in Part (b). Show that is not finite. [H]

**4.6. The Hyperelliptic Curve Discrete Logarithm Problem

The hyperelliptic curve discrete logarithm problem (HECDLP) has attracted less research attention than the ECDLP. Surprisingly, however, there exist subexponential (index calculus) algorithms for solving the HECDLP over curves of large genus. Adleman, DeMarrais and Huang first proposed such an algorithm [2] (which we will refer to as the ADH algorithm). Enge [86] suggested some modifications of the ADH algorithm and provided rigorous analysis of its running time. Gaudry [105] simplified the ADH algorithm and even implemented it. Gaudry’s experimentation suggests that it is feasible to compute discrete logarithms in Jacobians of almost cryptographic sizes, given that the genus of the underlying curve is high (say ≥ 6). Enge and Gaudry [87] proved rigorously that as long as the genus g is greater than ln q ( being the field over which the curve is defined), the ADH algorithm (and its improvements) run in time L(qg, 1/2, ).

In what follows, we outline Gaudry’s version of the ADH algorithm and refer to this as the ADH–Gaudry algorithm. Let C : Y2 + u(X)Y = v(X) be a hyperelliptic curve of genus g defined over a finite field . We assume that the cardinality of the Jacobian is known and has a suitably large prime divisor m. We assume further that a reduced divisor of order m is available, and we want to compute the discrete logarithm indα β of with respect to α.

4.6.1. Choosing the Factor Base

Recall that every reduced divisor can be written uniquely as , lg, where for ij the points Pi and Pj are not opposite of each other. Only ordinary points (not special points) may appear more than once in the list P1, . . . , Pl. We also know that such a divisor can be represented by a pair of unique polynomials a, satisfying deg b < deg ag and a|(b2 + buv). In that case, we write D = Div(a, b). What interests us is the fact that the roots of the polynomial a are precisely the X-coordinates of the points P1, . . . , Pl. This fact leads to the very useful concepts of prime divisors and smooth divisors.

Definition 4.4.

A divisor is called prime, if the polynomial is irreducible (that is, prime) over .

For an arbitrary divisor , let a = a1 · · · ar be the factorization of a into irreducible polynomials ai over . There exist polynomials such that , where Di := Div(ai, bi). In that case, the (prime) divisors D1, . . . , Dr are called the prime divisors of D. Moreover, if deg ai ≤ δ for all i = 1, . . . , r and for some , then D is called δ-smooth. In particular, D = Div(a, b) is 1-smooth if and only if a splits completely over .

In order to set up a factor base B, we predetermine a smoothness bound δ and let B consist of all the prime divisors with deg a δ. For simplicity, we take δ = 1. This is indeed a practical choice, when the genus g is not too large (say, g ≤ 9). Let be an (irreducible) polynomial of degree 1. In order to find out such that Div(a, b) is a prime divisor, we first see that deg b < deg a, that is, . Furthermore, a|(b2 + buv): that is, b2 + buv ≡ 0 (mod Xh); that is, b2 + bu(h) – v(h) = 0. Thus, the desired values of , if existent, can be found by solving a quadratic equation over . There are q irreducible polynomials of degree 1 and for each such a there are either two or no solutions for . Assuming that both these possibilities are equally likely, we conclude that the size of the factor base is ≈ q.

4.6.2. Checking the Smoothness of a Divisor

In order to check for the smoothness of a divisor over the factor base B, we first factor a over . Under the assumption that δ = 1, the divisor D is smooth if and only if a splits completely over . Let us write a(X) = (Xh1) ··· (Xhl), . Then for some we have , where Di := Div(Xhi, ki). We may use trial divisions (that is, trial subtractions in this additive setting) by elements of B in order to determine the prime divisors D1, . . . , Dl of D. Proposition 4.5 establishes the probability that a randomly chosen element of is smooth.

Proposition 4.5.

For q ≫ 4g2, there are approximately qg/g! (1-)smooth divisors in . In particular, the probability that a randomly chosen divisor in is smooth is approximately 1/g!.

The assumption q ≫ 4g2 is practical, since we usually employ curves of (fixed) small genus g over finite fields of medium sizes. For example, Koblitz [154] proposed the curve Y2 + Y = X13 of genus g = 6 over the prime field . An interesting consequence of the last proposition is that the proportion of smooth divisors in depends only on the genus g of C (and not on q).

4.6.3. The Algorithm

Now, we have all the machinery required to describe the basic version of the index calculus method for computing indα β in . In the first stage, we choose a random and compute the (reduced) divisor jα and check if jα is smooth over the factor base B. Every smooth jα gives a relation: that is, a linear congruence modulo m involving the (unknown) indices of the elements of B to the base α. After sufficiently many (say, ≥ 2(#B)) such relations are found, the system of linear congruences collected is expected to be of full rank and is solved modulo m. This gives us the indices of the elements of the factor base. Each congruence collected above contains at most g non-zero coefficients and so the system is necessarily sparse. In the second stage, we find out a single random j for which β +jα is smooth. The database prepared in the first stage then immediately gives indα β.

The Hasse–Weil Bounds (3.8) on p 226 show that the cardinality of is approximately qg. Thus O(g log q) bits are needed to represent an element of . This fact is consistent with the representation of reduced divisors by pairs of polynomials. Gaudry [105] calculates that this variant of the ICM does O(q2 + g!q) operations, each of which takes polynomial time in the input size g log q. If g is considered to be constant, the running time becomes O(q2 logt q) (that is, O~(q2)) for some real t > 0. A square root method on runs in (expected) time O~(qg/2). Thus for g > 4 the index calculus method performs better than the square root methods. Indeed Gaudry’s implementation of this algorithm is capable of computing in a few days discrete logs in the curve of genus 6 mentioned above. The Jacobian of this curve is of cardinality ≈ 1040.

For cryptographic purposes, we should have . If we want to take q small (so that multi-precision arithmetic can be avoided), we should choose large values of g. But this choice makes the ADH–Gaudry algorithm quite efficient. For achieving the desired level of security in cryptographic applications, hyperelliptic curves of genus 2, 3 and 4 only are recommended.

4.7. Solving Large Sparse Linear Systems over Finite Rings

So far we have seen many algorithms which require solving large systems of linear equations (or congruences). The number n of unknowns in such systems can be as large as several millions. Standard Gaussian elimination on such a system takes time O(n3) and space O(n2). There are asymptotically faster algorithms like Strassen’s method [292] that takes time O(n2.807) and Coppersmith and Winograd’s method [60] having a running time of O(n2.376). Unfortunately, these asymptotic estimates do not show up in the range of practical interest. Moreover, the space requirements of these asymptotically faster methods are prohibitively high (though still O(n2)).

Luckily enough, cryptanalytic algorithms usually deal with coefficient matrices that are sparse: that is, that have only a small number of non-zero entries in each row. For example, consider the system of linear congruences available from the relation collection stage of an ICM for solving the DLP over a finite field . The factor base consists of a subexponential (in lg q) number of elements, whereas each relation involves at most O(lg q) non-zero coefficients. Furthermore, the sparsity of the resulting matrix A is somewhat structured in the sense that the columns of A corresponding to larger primes in the factor base tend to have fewer numbers of non-zero entries. In this regard, we refer to the interesting analysis by Odlyzko [225] in connection with the Coppersmith method (Section 4.4.4). Odlyzko took m = 2n equations in n unknown indices and showed that about n/4 columns of A are expected to contain only zero coefficients, implying that these variables never occurred in any relation collected. Moreover, about 0.346n columns of A are expected to have only single non-zero coefficients.

The sparsity (as well as the structure of the sparsity) of the coefficient matrix A can be effectively exploited and the system can be solved in time O~(n2). In this section, we describe some special algorithms for large sparse linear systems. In what follows, we assume that we want to compute the unknown n-dimensional column vector x from the given system of equations

Ax = b,

where A is an m × n matrix, mn, and where b is a non-zero m-dimensional column vector. Though this is not the case in general, we will often assume for the sake of simplicity that A has full rank (that is, n). We write vectors as column vectors, that is, an l-dimensional vector v with elements v1, . . . , vl is written as v = (v1 v2 . . . vl)t, where the superscript t denotes matrix transpose.

Before we proceed further, some comments are in order. First note that our system of equations is often one over the finite ring which is not necessarily a field. Most of the methods we describe below assume that is a field, that is, r is a prime. If r is composite, we can do the following. First, assume that the prime factorization , αi > 0, of r is known. In that case, we first solve the system over the fields for i = 1, . . . , s. Then for each i we lift the solution modulo pi to the solution modulo . Finally, all these lifted solutions are combined using the CRT to get the solution modulo r.

Hensel lifting can be used to lift a solution of the system Axb (mod p) to a solution of Axb (mod pα), where p is a prime and . We proceed by induction on α. Let us denote the (or a) solution of Axb (mod p) by x1, which can be computed by solving a system in the field . Now, assume that for some we know (integer) vectors x1, . . . , xi such that

Equation 4.14


We then attempt to compute a vector xi+1 such that

Equation 4.15


Congruence (4.14) shows that the elements of A, x1, . . . , xi, b can be so chosen (as integers) that for some vector yi we have the equality

A(x1 + px2 + ··· + pi–1xi) = bpiyi

in . Substituting this in Congruence (4.15) gives Axi+1yi (mod p). Thus the (incremental) vector xi+1 can be obtained by solving a linear system in .

It, therefore, suffices to know how to solve linear congruences modulo a prime p. However, problems arise, when we do not know the factorization of r (while solving Axb (mod r)). If r is large, it would be a heavy investment to make attempts to factor r. What can be done instead is the following. First, we use trial divisions to extract the small prime factors of r. We may, therefore, assume that r has no small prime factors. We proceed to solve Axb (mod r) assuming that r is a prime (that is, that is a field). In a field, every non-zero element is invertible. But if r is composite, there are non-zero elements which are not invertible (that is, for which gcd(a, r) > 1). If, during the course of the computation, we never happen to meet (and try to invert) such non-zero non-invertible elements, then the computation terminates without any trouble. Otherwise, such an element a corresponds to a non-trivial factor gcd(a, r) of r. In that case, we have a partial factorization of r and restart solving the system modulo each suitable factor of r.

Some of the algorithms we discuss below assume that A is a symmetric matrix. In our case, this is usually not the case. Indeed we have matrices A which are not even square. Both these problems can be overcome by trying to solve the modified system AtAx = At b. If A has full rank, this leads to an equivalent system.

If r = 2 (as in the case of the QSM for factoring integers), using the special methods is often not recommended. In this case, the elements of A are bits and can be packed compactly in machine words, and addition of rows can be done word-wise (say, 32 bits at a time). This leads to an efficient implementation of ordinary Gaussian elimination, which usually runs faster than the more complicated special algorithms described below, at least for the sizes of practical systems.

In what follows, we discuss some well-known methods for solving large sparse linear systems over finite fields (typically prime fields). In order to simplify notations, we will refrain from writing the matrix equalities as congruences, but treat them as equations over the underlying finite fields.

4.7.1. Structured Gaussian Elimination

Structured Gaussian elimination is applied to a sparse system before one of the next three methods is employed to solve the system. If the sparsity of A has some structures (as discussed earlier), then structured Gaussian elimination tends to reduce the size of the system considerably, while maintaining the sparsity of the system. We now describe the essential steps of structured Gaussian elimination. Let us define the weight of a row or column of a matrix to be the number of non-zero entries in that row or column.

First we delete all the columns (together with the corresponding variables) that have weight 0. These variables never occur in the system and need not be considered at all.

Next we delete all the columns that have weight 1 and the rows corresponding to the non-zero entries in these columns. Each such deleted column correspond to a variable xi that appears in exactly one equation. After the rest of the system is solved, the value of xi is obtained by back substitution. Deleting some rows in the matrix in this step may expose some new columns of weight 1. So this step should be repeated, until all the columns have weight > 1.

Now, choose each row with weight 1. This gives a direct solution for the variable xi corresponding to the non-zero entry of the row. We then substitute this value of xi in all the equations where it occurs and subsequently delete the ith column. We repeat this step, until all rows are of weight > 1.

At this point, the system usually has many more equations than variables. We may make the system a square one by throwing away some rows. Since subtracting multiples of rows of higher weights tends to increase the number of non-zero elements in the matrix, we should throw away the rows with higher weights. While discarding the excess rows, we should be careful to ensure that we are not left with a matrix having columns of weight 0. Some columns in the reduced system may again happen to have weight 1. Thus, we have to repeat the above steps again. And again and again and . . . , until we are left with a square matrix each row and column of which has weight ≥ 2.

This procedure leads to a system which is usually much smaller than the original system. In a typical example quoted in Odlyzko [225], structured Gaussian elimination reduces a system with 16,500 unknowns to one with less that 1,000 unknowns. The resulting reduced system may be solved using ordinary Gaussian elimination which, for smaller systems, appears to be much faster than the following sophisticated methods.

4.7.2. The Conjugate Gradient Method

The conjugate gradient method was originally proposed to solve a linear system Ax = b over for an n × n (that is, square) symmetric positive definite matrix A and for a nonzero vector b and is based on the idea of minimizing the quadratic function . The minimum is attained, when the gradient ∇f = Axb equals zero, which corresponds to the solution of the given system.

The conjugate gradient method is an iterative procedure. The iterations start with an initial minimizer x0 which can be any n-dimensional vector. As the iterations proceed, we obtain gradually improved minimizers x0, x1, x2, . . . , until we reach the solution. We also maintain and update two other sequences of vectors ei and di. The vector ei stands for the error bAxi, whereas the vectors d0, d1, . . . constitute a set of mutually conjugate (that is, orthogonal) directions. We initialize e0 = d0 = bAx0 and for i = 0, 1, . . . repeat the steps of Algorithm 4.8, until ei = 0. We denote the inner product of two vectors v = (v1 v2 . . . vn)t and w = (w1 w2 . . . wn)t by .

Algorithm 4.8. An iteration in the conjugate gradient method

ai := 〈ei, ei〉/〈di, Adi〉.

xi+1 := xi + aidi.

ei+1 := eiaiAdi.

bi := 〈ei+1, ei+1〉/〈ei, ei〉.

di+1 := ei+1 + bidi.

This method computes a set of mutually orthogonal directions d0, d1, . . . , and hence it has to stop after at most n – 1 iterations, since we run out of new orthogonal directions after n – 1 iterations. Provided that we work with infinite precision, we must eventually obtain ei = 0 for some i, 0 ≤ in – 1.

If A is sparse, that is, if each row of A has O(logc n) non-zero entries, c being a positive constant, then the product Adi can be computed using O~(n) field operations. Other operations clearly meet this bound. Since at most n – 1 iterations are necessary, the conjugate gradient method terminates after performing O~(n2) field operations.

We face some potential problems, when we want to apply this method to solve a system over a finite field . First, the matrix A is usually not symmetric and need not even be square. This problem can be avoided by solving the system AtAx = At b. The new coefficient matrix AtA may be non-sparse (that is, dense). So instead of computing and working with AtA explicitly, we compute the product (AtA)di as At (Adi), that is, we avoid multiplication by a (possibly) dense matrix at the cost of multiplications by two sparse matrices.

The second difficulty with a finite field is that the question of minimizing an -valued function makes hardly any sense (and so does positive definiteness of a matrix over ). However, the conjugate gradient method is essentially based on the generation of a set of mutually orthogonal vectors d0, d1, . . . . This concept continues to make sense in the setting of a finite field.

If A is a real positive definite matrix, we cannot have 〈di, Adi〉 = 0 for a nonzero vector di. But this condition need not hold for a matrix A over . Similarly, we may have a non-zero error vector ei over , for which 〈ei, ei〉 = 0. (Again this is not possible for real vectors.) So for the iterations over (more precisely, the computations of ai and bi) to proceed gracefully, all that we can hope for is that before reaching the solution we never hit a non-zero direction vector di for which 〈di, Adi〉 = 0 nor a non-zero error vector ei for which 〈ei, ei〉 = 0. If q is sufficiently large and if the initial minimizer x0 is sufficiently randomly chosen, then the probability of encountering such a bad di or ei is rather low and as a result the method is very likely to terminate without problems. If, by a terrible stroke of bad luck, we have to abort the computation prematurely, we should restart the procedure with a new random initial vector x0. If q is small (say q = 2 as in the case of the QSM), it is a neater idea to select the entries of the initial vector x0 from a field extension and work in this extension. The eventual solution we will reach at will be in , but working in the larger field decreases the possibility of an attempt of division by 0.

There is, however, a brighter side of using a finite field in place of , namely every calculation we perform in is exact, and we do not have to bother about a criterion for determining whether an error vector ei is zero or about the conditioning of the matrix A. One of the biggest headaches of numerical analysis is absent here.

4.7.3. The Lanczos Method

The Lanczos method is another iterative method quite similar to the conjugate gradient method. The basic difference between these methods lies in the way by which the mutually conjugate directions d0, d1, . . . are generated. For the Lanczos method, we start with the initializations: d0 := b, , , x0 = a0d0. Then, for i = 1, 2, . . . , we repeat the steps in Algorithm 4.9 as long as .

Algorithm 4.9. An iteration in the Lanczos method

vi+1 := Adi.

.

.

xi := xi–1 + aidi.

If A is a real positive definite matrix, the termination criterion is equivalent to the condition di = 0. When this is satisfied, the vector xi–1 equals the desired solution x of the system Ax = b. Since d0, d1, . . . are mutually orthogonal, the process must stop after at most n – 1 iterations. Therefore, for a sparse matrix A, the entire procedure performs O~(n2) field operations.

The problems we face with the Lanczos method applied to a system over are essentially the same as those discussed in connection with the conjugate gradient method. The problem with a non-symmetric and/or non-square matrix A is solved by multiplying the system by At. Instead of working with AtA explicitly, we prefer to multiply separately by A and At.

The more serious problem with a system over is that of encountering a non-zero direction vector di with . If it happens, we have to abort the computation prematurely. In order to restart the procedure, we try to solve the system BAx = Bb, where B is a diagonal matrix whose diagonal elements are chosen randomly from the non-zero elements of the field or of some suitable extension (if q is small).

4.7.4. The Wiedemann Method

The Wiedemann method for solving a sparse system Ax = b over uses ideas different from those employed by the other methods discussed so far. For the sake of simplicity, we assume that A is a square non-singular matrix (not necessarily symmetric). The Wiedemann method tries to compute the minimal polynomial , dn, of A. To that end, one selects a small positive integer l in the range 10 ≤ l ≤ 20. For , let vi denote the column vector of length l consisting of the first l entries of the vector Aib. For the working of the Wiedemann method, we need to compute only the vectors v0, . . . , v2n. If A is a sparse matrix, this computation involves a total of O~(n2) operations in .

Since μA(A) = 0, we have for every . Therefore, for each k = 1, . . . , l the sequence v0,k, v1,k, . . . of the k-th entries of v0, v1, . . . satisfies the linear recurrence

But then the minimal polynomial μk(X) of the k-th such sequence is a factor of μA(X). There are methods that compute each μk(X) using O(n2) field operations. We then expect to obtain μA(X) = lcm(μk(X) | 1 ≤ kl).

The assumption that A is non-singular is equivalent to the condition that c0 ≠ 0. In that case, the solution vector can be computed using O~(n2) arithmetic operations in the field .

If A is singular, we may find out linear dependencies among the rows of A and subsequently throw away suitable rows. Doing this repeatedly eventually gives us a non-singular A. For further details on the Wiedemann method, see [303].

4.8. The Subset Sum Problem

In this section, we assume that be a knapsack set. For , we are required to find out such that , provided that a solution exists. In general, finding such a solution for ∊1, . . . , ∊n is a very difficult problem.[6] However, if the weights satisfy some specific bounds, there exist polynomial-time algorithms for solving the SSP.

[6] In the language of complexity theory, the decision problem of determining whether a solution of the SSP exists is NP-complete.

Let us first define an important quantity associated with a knapsack set:

Definition 4.5.

The density of the knapsack set is defined to be the real number .

If d(A) > 1, then there are, in general, more than one solutions for the SSP (provided that there exists one solution). This makes the corresponding knapsack set A unsuitable for cryptographic purposes. So we consider low densities: that is, the case that d(A) ≤ 1.

There are certain algorithms that reduce in polynomial time the problem of finding a solution of the SSP to that of finding a shortest (non-zero) vector in a lattice. Assuming that such a vector is computable in polynomial time, Lagarias and Odlyzko’s reduction algorithm [157] solves the SSP in polynomial time with high probability, if d(A) ≤ 0.6463. An improved version of the algorithm adapts to densities d(A) ≤ 0.9408 (see Coster et al. [64] and Coster et al. [65]). The reduction algorithm is easy and will be described in Section 4.8.1. However, it is not known how to efficiently compute a shortest non-zero vector in a lattice. The Lenstra–Lenstra–Lovasz (L3) polynomial-time lattice basis reduction algorithm [166] provably finds out a non-zero vector whose length is at most the length of a shortest non-zero vector, multiplied by a power of 2. In practice, however, the L3 algorithm tends to compute a shortest vector quite often. Section 4.8.2 deals with the L3 lattice basis reduction algorithm.

Before providing a treatment on lattices, let us introduce a particular case of the SSP, which is easily (and uniquely) solvable.

Definition 4.6.

A knapsack set {a1, . . . , an} with a1 < ··· < an is said to be superincreasing, if for all j = 2, . . . , n.

Algorithm 4.10 solves the SSP for a superincreasing knapsack set in deterministic polynomial time. The proof for the correctness of this algorithm is easy and left to the reader.

Algorithm 4.10. Solving the superincreasing knapsack problem

Input: A superincreasing knapsack set {a1, . . . , an} with a1 < ··· < an and .

Output: The (unique) solution for of , if it exists, failure, otherwise.

Steps:

for i = nn – 1, . . . , 1 {
   if (s ≥ ai) { ∊i := 1, s := s – ai. } else { ∊i := 0. }
}
if (s = 0) { Return (∊1, . . . , ∊n). } else { Return “failure”. }

4.8.1. The Low-Density Subset Sum Problem

We start by defining a lattice.

Definition 4.7.

Let n, , dn, and let be d linearly independent (non-zero) vectors (that is, n-tuples). The lattice L of dimension d spanned by v1, . . . , vd is the set of all -linear combinations of v1, . . . , vd, that is,

We say that v1, . . . , vd constitute a basis of L.

In general, a lattice may have more than one basis. We are interested in bases consisting of short vectors, where the concept of shortness is with respect to the following definition.

Definition 4.8.

Let v := (v1, . . . , vn)t and w := (w1, . . . , wn)t be two n-dimensional vectors in . The inner product of v and w is defined to be the non-negative real number

v, w〉 := v1w1 + ··· + vnwn,

and the length of v is defined as

For the time being, let us assume the availability of a lattice oracle which, given a lattice, returns a shortest non-zero vector in the lattice. The possibilities for realizing such an oracle will be discussed in the next section.

Consider the subset sum problem with the knapsack set A := {a1, . . . , an} and let B be an upper bound on the weights (that is, each aiB). For , we are supposed to find out such that . Let L be the n+1-dimensional lattice in generated by the vectors

where N is an integer larger than . The vector is in the lattice L, where . Involved calculations (carried out in Coster et al. [64, 65]) show that the probability P of the existence of a vector with ‖w‖ ≤ ‖v‖ satisfies , where c ≈ 1.0628. Now, if the density d(A) of A is less than 1/c ≈ 0.9408, then B = 2cn for some c′ > c and, therefore, P → 0 as n → ∞. In other words, if d(A) < 0.9408, then, with a high probability, ±v are the shortest non-zero vectors of L. The lattice oracle then returns such a vector from which the solution ∊1, . . . , ∊n can be readily computed.

4.8.2. The Lattice-Basis Reduction Algorithm

Let L be a lattice in specified by a basis of n linearly independent vectors v1, . . . , vn. We now construct a basis of such that (that is, and are orthogonal to each other) for all i, j, ij. Note that need not be a basis for L. Algorithm 4.11 is known as the Gram–Schmidt orthogonalization procedure.

Algorithm 4.11. Gram–Schmidt orthogonalization

Input: A basis v1, . . . , vn of

Output: The Gram–Schmidt orthogonalization of v1, . . . , vn.

Steps:

.
for i = 2, . . . , n {
   where .
}

One can easily verify that constitute an orthogonal basis of . Using these notations, we introduce the following important concept:

Definition 4.9.

The basis v1, . . . , vn is called a reduced basis of L, if

Equation 4.16


and

Equation 4.17


A reduced basis v1, . . . , vn of L is termed so, because the vectors vi are somewhat short. More precisely, we have Theorem 4.5, the proof of which is not difficult, but is involved, and is omitted here.

Theorem 4.5.

Let v1, . . . , vn be a reduced basis of a lattice L, and . For any m linearly independent vectors w1, . . . , wm of L, we have

for all i = 1, . . . , m. In particular, for any non-zero vector w of L we have

v12 ≤ 2n–1w2.

That is, for a reduced basis v1, . . . , vn of L the length of v1 is at most 2(n–1)/2 times that of the shortest non-zero vector in L.

Given an arbitrary basis v1, . . . , vn of a lattice L, the L3 basis reduction algorithm computes a reduced basis of L. The algorithm starts by computing the Gram–Schmidt orthogonalization of v1, . . . , vn. The rational numbers μi,j are also available from this step. We also obtain as byproducts the numbers for i = 1, . . . , n.

Algorithm 4.12 enforces Condition (4.16) |μk,l| ≤ 1/2 for a given pair of indices k and l. The essential work done by this routine is subtracting a suitable multiple of vl from vk and updating the values μk,1, . . . , μk,l accordingly.

Algorithm 4.12. Subroutine for basis reduction

Input: Two indices k and l.

Output: An update of the basis vectors to ensure |μk,l| ≤ 1/2.

Steps:

vk := vkrvl.

for h = 1, . . . , l – 1 {μk,h := μk,hrμl,h. }

μk,l := μk,lr.

If Condition (4.17) is not satisfied by some k, that is, if , then vk and vk–1 are swapped. The necessary changes in the values Vk, Vk–1 and certain μi,j’s should also be incorporated. This is explained in Algorithm 4.13.

Algorithm 4.13. Subroutine for basis reduction

Input: An index k.

Output: An update of the basis vectors to ensure .

Steps:

μ := μk,k–1.   V := Vk + μ2Vk–1.
μk,k–1 := μVk–1/V.   Vk := Vk–1Vk/V.   Vk–1 := V.
Swap (vkvk–1).
for h = 1, . . . , k – 2 { Swap (μk,hμk,h–1). }
for h = k + 1, . . . , n {
   μ′ := μh,k–1 – μμh,k.   μh,k–1 := μh,k + μk,k–1μ′.   μh,k := μ′.
}

The main basis reduction algorithm is described in Algorithm 4.14. It is not obvious that this algorithm should terminate at all. Consider the quantity D := d1 · · · dn–1, where di := | det(〈vk, vl〉)1≤k,li| for each i = 1, . . . , n. At the beginning of the basis reduction procedure one has diBi for all i = 1, . . . , n, where B := max(|vi|2 | 1 ≤ in). It can be shown that an invocation of Algorithm 4.12 does not alter the value of D, whereas interchanging vi and vi–1 in Algorithm 4.13 decreases D by a factor < 3/4. It can also be shown that for any basis of L the value D is bounded from below by a constant which depends only on the lattice. Thus, Algorithm 4.14 stops after finitely many steps.

Algorithm 4.14. Basis reduction in a lattice

Input: A basis v1, . . . , vn of a lattice L.

Output: v1, . . . , vn converted to a reduced basis.

Steps:

Compute the Gram–Schmidt orthogonalization of v1, . . . , vn (Algorithm 4.11).

/* The initial values of μi,j and Vi are available at this point */
i := 2.
while (i < n) {
   if (|μi,i–1| > 1/2) { Call Algorithm 4.12 with k = i and l = i – 1. }
   if 
      Call Algorithm 4.13 with k = i.
      i := max(2, i – 1).
   }
   for j = i – 2, i – 3, . . . , 1 {
      if (|μi,j| > 1/2) { Call Algorithm 4.12 with k = i and l = j. }
   }
   i++.
}

For a more complete treatment of the L3 basis reduction algorithm, we refer the reader to Lenstra et al. [166] (or Mignotte [203]). It is important to note here that the L3 basis reduction algorithm is at the heart of the Lenstra–Lenstra–Lovasz algorithm for factoring a polynomial in . This factoring algorithm indeed runs in time polynomially bounded by the degree of the polynomial to be factored and is one of the major breakthroughs in the history of symbolic computing.

Exercise Set 4.8

4.27Let be a knapsack set. Show that:
  1. If A is superincreasing with a1 < ··· < an, then ai ≥ 2i–1 for all i = 1, . . . , n and hence .

  2. If , then there exist two different tuples (∊1, . . . , ∊n) and in {0, 1}n such that .

4.28Let L be a lattice in and let v1, . . . , vn constitute a basis of L. The determinant of L is defined by

det L := det(v1, . . . , vn).

  1. Show that det L is an invariant of the lattice L (that is, independent of the basis v1, . . . , vn of L).

    Let be the Gram–Schmidt orthogonalization of the basis v1, . . . , vn.

  2. Show that det .

  3. Prove the Hadamard inequality: det L ≤ ‖v1‖ · · · ‖vn‖.

Chapter Summary

This chapter introduces the most common computationally intractable mathematical problems on which the security of public-key cryptosystems banks. We also describe some algorithms known till date for solving these difficult computational problems.

To start with, we enumerate these computational problems. The first problem in the row is the integer factorization problem (IFP) and its several variants. Some problems that are provably or believably equivalent to the IFP are the totient problem, problems associated with the RSA algorithm, and the modular square root problem. The next class of problems includes the discrete logarithm problem (DLP) and its variants on elliptic curves (ECDLP) and hyperelliptic curves (HECDLP). The Diffie–Hellman problem (DHP) and its variants (ECDHP, HECDHP) are believed to be equivalent to the respective variants of the DLP. Finally, the subset sum problem (SSP) and two related problems, namely the shortest vector problem (SVP) and the closest vector problem (CVP) on lattices, are introduced.

The subsequent sections are devoted to an algorithmic study of these difficult problems. We start with IFP. We first present some fully exponential algorithms like trial division, Pollard’s rho method, Pollard’s p – 1 method and Williams’ p + 1 method. Next we describe the modern genre of subexponential algorithms. The quadratic sieve method (QSM) is discussed at length together with its heuristic improvements like incomplete sieving, large prime variation and the multiple polynomial variant. We also describe TWINKLE, a hardware device that efficiently implements the sieving stage of the QSM. We then discuss the elliptic curve method (ECM) and the number field sieve method (NFSM) for factoring integers. The NFSM turns out to be the asymptotically fastest known algorithm for factoring integers.

The (finite field) DLP is discussed next. The older square-root methods, such as Shanks’ baby-step–giant-step method (BSGS), Pollard’s rho method and the Pohlig–Hellman method (PHM), take exponential running times in the worst case. The PHM for a field is, however, efficient if q – 1 has only small prime factors. Next we discuss the modern family of algorithms collectively known as the index calculus method (ICM). For prime fields, we discuss three variants of the ICM, namely the basic method, the linear sieve method (LSM) and the number field sieve method (NFSM). We also discuss three variants of the ICM for fields of characteristic 2: the basic method, the linear sieve method and Coppersmith’s algorithm. Another interesting variant is the cubic sieve method (CSM) covered in the exercises. We explain Gordon and McCurley’s polynomial sieving in connection with Coppersmith’s algorithm.

The next section deals with algorithms for solving the ECDLP. For a general elliptic curve, the exponential square-root methods are the only known algorithms. For some special classes of curves, more efficient methods are proposed in the literature. The MOV reduction based on Weil pairing reduces ECDLP on a curve over to DLP in the finite field for some suitable . This k is small and the reduction is efficient for supersingular curves. The SmartASS method (also called the anomalous method) reduces the ECDLP in an anomalous curve to the computation of p-adic discrete logarithms. This reduction solves the original DLP in polynomial time. In view of these algorithms, it is preferable to avoid supersingular and anomalous curves in cryptographic applications. The xedni calculus method (XCM) is discussed finally. This algorithm works by lifting a curve over to a curve over . Experimental and theoretical evidences suggest that the XCM is not an efficient solution to the ECDLP.

We then devote a section to the study of an index calculus method to solve the HECDLP. For hyperelliptic curves of small genus, this method leads to a subexponential algorithm (the ADH–Gaudry algorithm).

Many of the above subexponential methods require solving a system of linear congruences over finite rings. This (inherently sequential) linear algebra part often turns out to be the bottleneck of the algorithms. However, the fact that these equations are necessarily sparse can be effectively exploited, and some faster algorithms can be used to solve these systems. We study four such algorithms: structured Gaussian elimination, the conjugate gradient method, the Lanczos method and the Wiedemann method.

In the last section, we study the subset sum problem. We first reduce the SSP to problems associated with lattices. We finally present the lattice-basis reduction algorithm due to Lenstra, Lenstra and Lovasz.

Several other computationally intractable problems have been proposed in the literature for building cryptographic systems. Some of these problems are mentioned in the annotated references of Chapter 5. Due to space and time limitations, we will not discuss these problems in this book.

Suggestions for Further Reading

The integer factorization problem is one of the oldest computational problems. Though the exact notion of computational complexity took shape only after the advent of computers, the apparent difficulty of solving the factorization problem has been noticed centuries ago. Crandall and Pomerance [69] call it the fundamental computational problem of arithmetic. Numerous books and articles provide discussions on this subject at varying levels of coverage. Crandall and Pomerance [69] is perhaps the most extensive in this regard. The reader can also take a look at Bressoud’s (much simpler) book [36] or the (compact, yet reasonably detailed) Chapter 10 of Henri Cohen’s book [56]. The articles by Lenstra et al. [164] and by Montgomery [211] are also worth reading.

John M. Pollard has his name attached to three modern inventions in the arena of integer factorization. In [238, 239], he introduces the rho and p – 1 methods. (Later he has been part of the team that has designed the number-field sieve factoring algorithm.) Williams’ p + 1-method appears in 1982 in [305].

The continued fraction method (CFRAC) is apparently the first known subexponential-time integer factoring algorithm. It is based on the work of Lehmer and Powers [162] and first appears in the currently used form in Morrison and Brillhart’s paper [213]. CFRAC happens to be the most widely used integer factoring algorithm used during late 1970s and early 1980s.

The quadratic sieve method, invented by Carl Pomerance [241] in 1984, supersedes the CFRAC method. The multiple-polynomial QSM appears in Silverman [279]. Hendrik Lenstra’s elliptic curve method [174] is proposed almost concurrently as the QSM. Nowadays, the QSM and the ECM are the most commonly used factoring methods. Reyneri’s cubic sieve method is described in Lenstra and Lenstra [165].

The theoretically superior number field sieve method follows from Pollard’s factoring method using cubic integers [240]. The initial proposal for the NFS method is that of the simple NFS and appears in Lenstra et al. [167]. It is later modified to the general NFS method in Buhler et al. [41]. Lenstra and Lenstra [165] is a compilation of papers on the NFS method. Though the NFS method is the asymptotically fastest factoring method, its fairly complicated implementation makes the algorithm superior to QSM or ECM, only when the bit size of the integer to be factored is reasonably large.

Shamir’s factoring engine TWINKLE is proposed in [269]. A. K. Lenstra and Shamir analyse and optimize its design in [168]. Shamir and Tromer [270] have proposed a device called TWIRL (The Weizmann Institute Relation Locator) that is geared to the NFS factoring method. It is estimated that a TWIRL implementation costing US$10K can complete the sieving for a 512-bit RSA modulus in less than 10 minutes, whereas one that does the same for a 1024-bit RSA modulus costs US$10–50M and takes a time of one year. Lenstra et al. [163] provide a more detailed analysis of these estimates. See Lenstra et al. [169] to know about Bernstein’s factorization circuit which is another implementation of the NFS factoring method.

The (finite field) discrete logarithm problem also invoked much research in the last few decades. The older square-root methods are described well in the book [191] by Menezes. Donald Knuth attributes the baby-step–giant-step method to Daniel Shanks. See Stein and Teske [290] for various optimizations of the baby-step–giant-step method. Pollard’s rho method is an adaptation of the same method for integer factorization. See Pohlig and Hellman [234] for the Pohlig–Hellman method.

The first idea of the index calculus method appears in Western and Miller [302]. Coppersmith et al. [59] describe three variants of the index calculus method: the linear sieve method, the residue list sieve method and the Gaussian integer method. The same paper also proposes the cubic sieve method (CSM). LaMacchia and Odlyzko [158] describe an implementation of the linear sieve and the Gaussian integer methods. Das and Veni Madhavan [73] make an implementation study of the CSM. Also look at the survey [189] by McCurley.

Gordon [119] uses number field sieves for computing discrete logarithms over prime fields. Weber et al. [261, 299, 300, 301] have implemented and proved the practicality of the number field sieve method. Also see Schirokauer’s paper [260].

Odlyzko [225] surveys the algorithms for computing discrete logs in the fields . The best algorithm for these fields is Coppersmith’s algorithm [57]. No analog of this algorithm is known for prime fields. Gordon and McCurley [120] use Coppersmith’s algorithm for the computation of discrete logarithms in and .

The article [226] by Odlyzko and the one [242] by Pomerance are two recent surveys on the finite field discrete logarithm problem. Also see Buchmann and Weber [40].

The elliptic curve discrete logarithm problem seems to be a very difficult computational problem. A direct adaptation of the index calculus method is expected to lead to a running time worse than that of brute-force search (Silverman and Suzuki [278] and Blake et al. [24].) Menezes et al. [193] reduce the problem of computing discrete logs in an elliptic curve over to computing discrete logs in the field for some k. For supersingular elliptic curves, this k can be chosen to be small. For a general curve, the MOV reduction takes exponential time (Balasubramanian and Koblitz [16]). The SmartASS method is due to Smart [282], Satoh and Araki [257] and Semaev [265]. Joseph H. Silverman proposes the xedni calculus method in [277]. This method has been experimentally and heuristically shown to be impractical by Jacobson et al. [139].

Adleman et al. [2] propose the first subexponential algorithm for the hyperelliptic curve discrete log problem. This algorithm is applicable for curves of high genus over prime fields. The analysis of its running time is based on certain heuristic assumptions. Enge [86] provides a subexponential algorithm which has a rigorously provable running time and which works for curves over any arbitrary field . Again, the algorithm demands curves of high genus. An implementation of the Adleman–DeMarrais–Huang algorithm is given by Gaudry [105]. Also see Enge and Gaudry [87].

Gaudry et al. [107] propose a Weil-descent attack for the hyperelliptic curve discrete log problem. This is modified in Galbraith [100] and Galbraith et al. [101].

Coppersmith et al. [59] describe sparse system solvers. LaMacchia and Odlyzko [159] implement these methods. For further details, see Montgomery [212], Coppersmith [58], Wiedemann [303], and Yang and Brent [306].

That public-key cryptosystems can be based on the subset-sum problem (or the knapsack problem) was considered at the beginning of the era of public-key cryptography. Historically the first realization of a public-key system is based along this line and is due to Merkle and Hellman [196]. But the Merkle–Hellman system and several variants of it are broken; see Shamir [266], for example. At present, most public-key systems based on the subset-sum problem are known to be insecure.

The lattice-basis reduction algorithm and the associated L3 algorithm for factoring polynomials appear in the celebrated work [166] of Lenstra, Lenstra and Lovasz. Mignotte’s book [203] also describes these topics in good details.

5. Cryptographic Algorithms

5.1Introduction
5.2Secure Transmission of Messages
5.3Key Exchange
5.4Digital Signatures
5.5Entity Authentication
 Chapter Summary
 Sugestions for Further Reading

An essential element of freedom is the right to privacy, a right that cannot be expected to stand against an unremitting technological attack.

—Whitfield Diffie

Mary had a little key (It’s all she could export), and all the email that she sent was opened at the Fort.

—Ronald L. Rivest

Treat your password like your toothbrush. Don’t let anybody else use it, and get a new one every six months.

—Clifford Stoll

5.1. Introduction

As we pointed out in Chapter 1, cryptography tends to guard sensitive data from unauthorized access. We shortly describe some algorithms that achieve this goal. We restrict ourselves only to public-key algorithms. In practice, however, public-key algorithms are used in tandem with secret-key algorithms. In this chapter, we describe only the basic routines to which are input mathematical entities like integers, points in finite fields or on curves. Message encoding will be dealt with in Chapter 6.

5.2. Secure Transmission of Messages

Consider the standard scenario: a party named Alice, and called sender, is willing to send a secret message m to a party named Bob, and called receiver or recipient, over a public communication channel. A third party Carol may intercept and read the message. In order to maintain the secrecy of the message, Alice uses a well-defined transform fe to convert the plaintext message m to the ciphertext message c and sends c to Bob. Bob possesses some secret information with the help of which he uses the reverse transformation fd in order to get back m. Carol who is expected not to know the secret information cannot retrieve m from c by applying the transformation fd.

In a public-key system, the realization of the transforms fe and fd is based on a key pair (e, d) predetermined by Bob. The public key e is made public, whereas the private key d is kept secret. The encryption transform generates c = fe(m, e). Since e is a public knowledge, anybody can generate c from a given m, whereas the decryption transform m = fd(c, d) can be performed only by Bob who possesses the knowledge of d. The key pair has to be so chosen that knowledge of e does not allow Carol to compute d in feasible time. The intractability of the computational problems discussed in Chapter 4 can be exploited to design such key pairs. The exact realization of the keys e, d and the transforms fe, fd depends on the choice of the underlying intractable problem and also on the way to make use of the problem. Since there are several intractable problems suitable for cryptography, there are several encryption schemes varying widely in algorithmic and mathematical details.

5.2.1. The RSA Public-key Encryption Algorithm

RSA has been the most popular encryption algorithm. Historically also, it is the first public-key encryption algorithm published in the literature (see Rivest et al. [252]). Its security is based on the intractability of the RSAP (or the RSAKIP) discussed in Exercise 4.2. Since both these problems are polynomial-time reducible to the IFP, we often say that the RSA algorithm derives its security from the intractability of the IFP. It may, however, be the case that breaking RSA is easier than factoring integers, though no concrete evidences seem to be available.

RSA key pair

Algorithm 5.1 generates a key pair for RSA.

Algorithm 5.1. RSA key generation

Input: A bit length l.

Output: A random RSA key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Choose an integer e coprime to φ(n) = (p – 1)(q – 1).

d := e–1 (mod φ(n)).

Return the pair (n, e) as the public key and the pair (n, d) as the private key.

The length l of the primes p and q should be chosen large enough so as to make the factorization of n infeasible. For short-term security, values of l between 256 and 512 suffice. For long-term security, one may choose l as large as 2,048.

The random primes p and q can be generated using a probabilistic algorithm like those described in Section 3.4.2. Naive primes are normally considered to be sufficiently secure in this respect, since p ± 1 and q ± 1 are expected to have large prime factors in general. Gordon’s algorithm (Algorithm 3.14) can also be used for generating strong primes p and q. Since Gordon’s algorithm runs only nominally slower than the algorithm for generating naive primes, there is no harm in using strong primes. Safe primes, on the other hand, are difficult to generate and may be avoided.

The RSA modulus n is public knowledge. Determining d from n and e is easily doable, given the value of φ(n) = (p – 1)(q – 1) which, in turn, is readily computable, if p and q are known. If an adversary can compute φ(n) (with or without factoring n), the security of the RSA protocol based on the modulus n is compromised. However, computing φ(n) without the knowledge of p and q is (at least historically) a very difficult computational problem, and so, if n is reasonably large, RSA encryption is assumed to be sufficiently secure.

RSA encryption is done by raising the plaintext message m to the power e modulo n. In order to speed up this (modular) exponentiation, it is often expedient to take a small value for e (like 3, 257 and 65,537). However, in that case one should adopt certain precautions as Exercise 5.2 suggests. More specifically, if e entities share a common (small) encryption key e but different (pairwise coprime) moduli and if the same message m is encrypted using all these public keys, then an eavesdropper can reconstruct m easily from a knowledge of the e ciphertext messages. Another potential problem of using small e is that if m is small, that is, if m < n1/e, then m can be retrieved by taking the integer e-th root of the ciphertext message.

Although the pair (n, d) is sufficient for carrying out RSA decryption, maintaining some additional (secret) information significantly speeds up decryption. To this end, it is often recommended that some or all of the values n, e, d, p, q, d1, d2, h be stored, where d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p).

If n can be factored, then d can be easily computed from the public key (n, e). Conversely, if n, e, d are all known, there is an efficient probabilistic algorithm which factors n. This algorithm is based on the fact that if ed – 1 = 2st with t odd, then for at least half of the integers there exists such that a2σt ≢ ±1 (mod n), whereas a2σ+1t ≡ 1 (mod n). But then the gcd of n and a2σt – 1 is a non-trivial factor of n. For the details, solve Exercise 7.9.

Different entities in a given network should use different values of n. If two or more entities share a common n but different exponent pairs (ei, di), then each entity can first factor n and then use this factorization to compute the private keys of other entities. Primes are quite abundant in nature and so finding pairwise coprime RSA moduli for all entities is no problem at all. A common value of the encryption exponent e (for example, a small value of e) can, however, be shared by all entities. In that case, for pairwise different moduli ni, the corresponding decryption exponents di will also be pairwise different.

RSA encryption

RSA encryption is rather simple, as Algorithm 5.2 shows.

Algorithm 5.2. RSA encryption

Input: The RSA public key (n, e) of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := me (mod n).

By Exercise 4.1, the exponentiation function mme is bijective; so m can be uniquely recovered from c. It is clear why small encryption exponents e speed up RSA encryption. For a general exponent e, the routine takes time O(log3 n), whereas for a small e (that is, e = O(1)) the running time drops to O(log2 n).

RSA decryption

RSA decryption (Algorithm 5.3) is analogous to RSA encryption.

Algorithm 5.3. RSA decryption

Input: The RSA private key (n, d) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m := cd (mod n).

The correctness of this decryption procedure follows from Exercise 4.1. As in the case of encryption, one might go for small decryption exponents d. In general, both e and d cannot be small simultaneously. If e is small, the security of the RSA scheme is expected not be affected, whereas small values of d are not desirable for several reasons. First, if d is very small, the adversary chooses some m, computes the corresponding ciphertext c (using public knowledge) and then keeps on computing cx (mod n) for x = 1, 2, . . . until x = d is reached, that is, until the original message m is recovered.

Even when d is not very small so that the possibility of exhaustive search with x = 1, 2, . . . can be precluded, there are several attacks known for small private exponents. Wiener [304] proposes an efficient algorithm in this respect. Boneh and Durfee [32] improve Wiener’s algorithm. Sun et al. [294] propose three variants of the RSA scheme that are resistant to these attacks. Durfee and Nguyen [82] extend the Boneh–Durfee attack to break two of these three variants. To sum up, it is advisable not to use small secret exponents d, that is, the bit length of d should be close to that of n in order to achieve the desired level of security.

There are alternative ways to speed up RSA decryption. If the values p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are all available to the recipient, he can use Algorithm 5.4 for RSA decryption.

Algorithm 5.4. RSA decryption using CRT

Input: The RSA extended private key (p, q, d1, d2, h) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

m1 := cd1 (mod p).

m2 := cd2 (mod q).

t := h(m1m2) (mod p).

m := m2 + tq.

In this modified routine, m1 := m rem p and m2 := m rem q are first computed and then combined using the CRT to get m modulo n = pq. Algorithm 5.3 performs a single modular exponentiation modulo n, whereas in Algorithm 5.4 two exponentiations modulo p and q respectively take the major portion of the running time. Since an exponentiation modulo N to an exponent O(N) runs in time O(log3 N), and since each of p and q has bit length (about) half of that of n, Algorithm 5.4 runs about four times as fast as Algorithm 5.3.

If only the values p, q, d are stored, then d1, d2 and h can be computed on the fly using relatively inexpensive operations and subsequently Algorithm 5.4 can be used. This leads to a decryption routine almost as fast as Algorithm 5.4, but calls for somewhat smaller memory requirements for the storage of the private key.

5.2.2. The Rabin Public-key Encryption Algorithm

The Rabin public-key encryption algorithm is based on the intractability of computing square roots modulo a composite integer (SQRTP). By Exercise 4.10, the SQRTP is probabilistically polynomial-time equivalent to the IFP, that is, breaking the Rabin scheme is provably as hard as factoring integers. Breaking RSA, on the other hand, is only believed to be equivalent to factoring integers. Moreover, Rabin encryption is faster than RSA encryption (for moduli of the same size).

Rabin key pair

Like RSA, Rabin encryption requires a modulus of the form n = pq.

Algorithm 5.5. Rabin key generation

Input: A bit length l.

Output: A random Rabin key pair.

Steps:

Generate two different random primes p and q each of bit length l.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Here, the choice of the bit length l and the generation of the primes p and q follow the same guidelines as discussed in connection with RSA key generation.

Rabin encryption

Encryption in the Rabin scheme involves a single modular squaring.

Algorithm 5.6. Rabin encryption

Input: The Rabin public key n of the recipient and the plaintext message .

Output: The ciphertext message .

Steps:

c := m2 (mod n).

Unfortunately, the Rabin encryption map mm2 (mod n) is not injective. In general, a ciphertext c has four square roots modulo n.[1] This poses ambiguity during decryption. In order to work around this difficulty, one adds some distinguishing feature or redundancy to the message m before encryption. One possibility is to duplicate a predetermined number of bits at the least significant end of m. This reduces the message space somewhat, but is rarely a serious issue. Only one of the (four) square roots of the ciphertext c is expected to have the desired redundancy. If none or more than one square root possesses the redundancy, decryption fails. However, this is a very rare phenomenon and can be ignored for all practical purposes.

[1] More specifically, if an element is a square modulo both p and q, then the number of square roots of c equals 1 if c = 0; it is 2 if either c ≡ 0 (mod p) or c ≡ 0 (mod q) but not both; and it is 4 if c ≢ 0 (mod p) and c ≢ 0 (mod q). If c is not a square modulo either p or q, then c does not possess a square root modulo n. These assertions can be readily proved using the Chinese remainder theorem.

Rabin decryption

Rabin decryption (Algorithm 5.7) involves computing square roots modulo n. Since n is composite, this is a very difficult problem (for the eavesdropper). But the knowledge of the prime factors p and q of n allows the recipient to decrypt.

Algorithm 5.7. Rabin decryption

Input: The Rabin private key (p, q) of the recipient and the ciphertext message .

Output: The recovered plaintext message .

Steps:

if or ( { Returnc is not a ciphertext message”. }

Compute the square roots of c mod p./* Algorithm 3.17 */
Compute the square roots of c mod q./* Algorithm 3.17 */
Compute the square roots of c mod n from those mod p and q./* Use CRT */

if (c has exactly one distinguished square root m mod n) { Return m. }

else { Return “failure”. }

5.2.3. The Goldwasser–Micali Encryption Algorithm

So far, we have encountered encryption algorithms that are deterministic in the sense that for a given public key of the recipient the same plaintext message encrypts to the same ciphertext message. In a probabilistic encryption algorithm, different calls of the encryption routine produce different ciphertext messages for the same plaintext message and public key.

The Goldwasser–Micali encryption algorithm is probabilistic and is based on the intractability of the quadratic residuosity problem (QRP) described in Exercise 4.2. If n is a composite integer and a an integer coprime to n, then implies that a is a quadratic non-residue modulo n. The converse does not hold, that is, one may have , even when a is a quadratic non-residue modulo n. For example, if n is the product of two distinct odd primes p and q, then a is a quadratic residue modulo n if and only if a is a quadratic residue modulo both p and q. However, if , we continue to have . There is no easy way to find out if a is a quadratic residue modulo n for an integer a with . If the factorization of n is available, the QRP is solvable in polynomial time. These observations lead to the design of the Goldwasser–Micali scheme.

Goldwasser–Micali key pair

The Goldwasser–Micali scheme works in the ring , where n is the product of two distinct sufficiently large primes. The integer a (resp. b) in Algorithm 5.8 can be found by randomly choosing elements of (resp. ) and computing the Legendre symbol (resp. ). Under the assumption that quadratic non-residues are randomly located in and , a and b can be found after only a few trials. The integer x is a quadratic non-residue modulo n with .

Goldwasser–Micali encryption

Goldwasser–Micali encryption (Algorithm 5.9) is probabilistic, since its output is dependent on a sequence of random elements ai of . It generates a tuple (c1, . . . , cr) of elements of such that each . If mi = 0, then ci is a quadratic residue modulo n, whereas if mi = 1, ci is a quadratic non-residue modulo n. Therefore, if the quadratic residuosity of ci modulo n can be computed, the bit mi can be determined. If one (for example, the recipient) knows the factorization of n or equivalently the prime factor p of n, one can perform decryption easily. An eavesdropper, on the other hand, must solve the QRP (or the IFP) in order to find out the bits m1, . . . , mr. This is how Goldwasser–Micali encryption derives its security.

Algorithm 5.8. Goldwasser–Micali key generation

Input: A bit length l.

Output: A random Goldwasser–Micali key pair.

Steps:

Generate two (different) random primes p and q each of bit length l.

n := pq.

Find out integers a and b such that .

Compute an integer x with x ≡ a (mod pand x ≡ b (mod q).          /* Use CRT */

Return the pair (n, x) as the public key and the prime p as the private key.

Algorithm 5.9. Goldwasser—Micali encryption

Input: The Goldwasser—Micali public key (n, x) of the recipient and the plaintext message m = m1 . . . mr, , which is a bit string of length r.

Output: The ciphertext message .

Steps:

for i = 1, . . . , r {
   Select a random element .
   .
}

Since randomly chosen non-zero elements of are with high probability coprime to n, it is sufficient to draw ai from \{0} and skip the check whether gcd(ai, n) = 1. In fact, if an ai with gcd(ai, n) > 1 is somehow located, this gcd equals a non-trivial factor of n, and the security of the scheme is broken.

The Goldwasser–Micali scheme has the drawback that the length of the ciphertext message is much bigger than that of the plaintext message. Thus, for example, for a 1024-bit modulus n and a message m of bit length 64, the output requires a huge 65,536-bit space. This phenomenon is called message expansion and can be a serious limitation in certain circumstances.

Goldwasser–Micali decryption

Goldwasser–Micali decryption (Algorithm 5.10) recovers the bits of the plaintext message by computing Legendre symbols modulo the prime divisor p of n. The correctness of this decryption algorithm is evident from the discussion immediately following Algorithm 5.9.

Algorithm 5.10. Goldwasser—Micali decryption

Input: The Goldwasser—Micali private key p of the recipient and the ciphertext message .

Output: The recovered plaintext message m = m1, . . . , mr, .

Steps:

for i = 1, . . . , r {
   if  else { mi :=1 }
}

5.2.4. The Blum–Goldwasser Encryption Algorithm

The Blum–Goldwasser algorithm is another probabilistic encryption algorithm and is better than the Goldwasser–Micali algorithm in the sense that in this case the message expansion is by only a constant number of bits irrespective of the length of the plaintext message. The Blum–Goldwasser scheme is based on the intractability of the SQRTP (modulo a composite integer).

Blum–Goldwasser key pair

As in the case of the encryption algorithms discussed so far, the Blum–Goldwasser algorithm works in the ring , where n = pq is the product of two distinct primes p and q. Now, we additionally demand p and q to be both congruent to 3 modulo 4.

Algorithm 5.11. Blum–Goldwasser key generation

Input: A bit length l.

Output: A random Blum–Goldwasser key pair.

Steps:

Generate two (different) random primes p and q each of bit length l and each congruent to 3 mod 4.

n := pq.

Return n as the public key and the pair (p, q) as the private key.

Since p and q are two different primes, there exist integers u and v such that up + vq = 1. In order to speed up decryption, it is often expedient to store u and v along with p and q in the private key. Recall that the solution of the congruences xa (mod p) and xb (mod q) is given by xvqa + upb (mod n).

Blum–Goldwasser encryption

The Blum–Goldwasser encryption algorithm assumes that the input plaintext message m is in the form of a bit string, and breaks m into substrings of a fixed length t. A typical choice for t is t = ⌊lg lg n⌋, where n is the public key of the recipient. Write m = m1 . . . mr, where each mi is a bit string of length t. The ciphertext consists of r bit strings c1, . . . , cr, each of bit length t, and an element .

Algorithm 5.12. Blum–Goldwasser encryption

Input: The Blum–Goldwasser public key n of the recipient and the plaintext message m = m1 . . .mr, where each mi is a bit string of length t.

Output: The ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and .

Steps:

Choose a random element .

d := d2 (mod n).
for i = 1, . . . , r {
   d := d2 (mod n).
   δ := the t least significant bits of d.
   ci := mi ⊕ δ.                                            /* Here ⊕ denotes bit-wise XOR of t-bit strings */
}
d := d2 (mod n).

Blum–Goldwasser encryption involves computation of r modular squares in and is quite fast (for example, faster than RSA encryption with a general encryption exponent). It makes sense to assume that the initial choice of d is from , since finding a non-zero non-invertible element of is as difficult as factoring n.

For an intruder to determine the plaintext message m from the corresponding ciphertext message, the values of d inside the for loop are necessary. These can be obtained by taking repeated square roots modulo n. Since n is composite, this is a difficult problem. On the other hand, since the recipient knows the prime divisors p and q of n, taking square roots modulo n requires only polynomial-time effort.

Blum–Goldwasser decryption

Recall from Exercise 3.43 that a quadratic residue (where n is the public key of the recipient) has four distinct square roots of which exactly one is again a quadratic residue modulo n. This distinguished square root y of d satisfies the congruences yd(p+1)/4 (mod p) and yd(q+1)/4 (mod q). In the decryption Algorithm 5.13, we assume that .

Algorithm 5.13 assumes that each value of d is a quadratic residue modulo n. This can be verified by inserting in the for loop a check whether , before an attempt is made to compute the square root of d modulo n. If (c1, . . . , cr, d) is a valid ciphertext message, this condition necessarily holds, and there is no fun wasting time for checking obvious things. However, if there is a possibility that d is altered by an (active) adversary (or corrupted during transmission), one may insert this check. In that case, the routine should report failure, when the square root of a quadratic non-residue modulo n is to be computed.

Algorithm 5.13. Blum–Goldwasser decryption

Input: The Blum–Goldwasser private key (p, q) of the recipient and the ciphertext message (c1, . . . , cr, d), where each ci is a bit string of length t and .

Output: The recovered plaintext message m = m1 . . . mr, where each mi is a bit string of length t.

Steps:

for i = rr – 1, . . . , 1 {
   a := d(p+1)/4 (mod pand b := d(q+1)/4 (mod q).
   Compute  with d ≡ a (mod pand d ≡ b (mod q).  /* Use CRT */
   δ := the t least significant bits of d.
   mi := ci ⊕ δ.  /* XOR of t-bit strings */
}

5.2.5. The ElGamal Public-key Encryption Algorithm

The ElGamal encryption algorithm works in a group G in which it is difficult to solve the Diffie–Hellman problem (DHP). Typical candidates for G include the multiplicative group of a finite field (usually q is a prime or a power of 2), the (additive) group of points on an elliptic curve over a finite field and the (additive) group (called the Jacobian) of reduced divisors on an hyperelliptic curve over a finite field. Here we assume that G is multiplicatively written and has order n. It is not necessary for G to be cyclic, but we should have at our disposal an element with a suitably large (preferably prime) order k. We essentially work in the cyclic subgroup H of G generated by g (but using the arithmetic of G). For the ElGamal scheme, G (together with its representation), g, n and k are made public and can be shared by different entities on a network.

ElGamal key pair

Generating a key pair for the ElGamal scheme (Algorithm 5.14) involves an exponentiation in G. In order to make the exponentiation efficient, the exponent (the private key) is often chosen to have a small number of 1 bits. However, if this number is too small, exhaustive search by an adversary may become feasible.

If the DLP can be solved in G, the private key d can be computed from the public key gd. This amounts to breaking a system based on this key pair. This is why we often say that the security of the ElGamal encryption scheme banks on the intractability of the DLP. But as we see shortly, the DHP is the more fundamental computational problem that dictates the security of ElGamal encryption.

Algorithm 5.14. ElGamal key generation

Input: G, g and k as defined above.

Output: A random ElGamal key pair.

Steps:

Generate a random integer d, 2 ≤ dk – 1.

Return gd as the public key and d as the private key.

ElGamal encryption

Given a message , the ElGamal encryption procedure (Algorithm 5.15) generates a pair (r, s) of elements of G as the ciphertext message and thus corresponds to message expansion by a factor of 2. Clearly, the sender has all the relevant information for computing (r, s). The need for using a different session key for each encryption is explained in Exercise 5.6.

Algorithm 5.15. ElGamal encryption

Input: (G, g, k and) the ElGamal public key gd of the recipient and the plaintext message .

Output: The ciphertext message (where H = 〈g〉).

Steps:

Generate a (random) session key d′, 2 ≤ d′ ≤ k – 1.

r := gd′.

s := mgdd′ = m(gd)d′.

Notice that ElGamal encryption uses two exponentiations in G to exponents which are O(k). Therefore, the running time of Algorithm 5.15 reduces, if smaller values of k are selected. On the other hand, if k is too small, the square-root methods in H = 〈g〉 may become efficient (see Section 4.4.1). In practice, it is recommended that k be taken as a prime of length 160 bits or more.

ElGamal decryption

ElGamal decryption involves an exponentiation in G to an exponent which is O(k). It is easy to verify that Algorithm 5.16 performs decryption correctly and that the recipient has the necessary information to carry out decryption.

Algorithm 5.16. ElGamal decryption

Input: (G, g, k and) the ElGamal private key d of the recipient and the ciphertext message (where H = 〈g〉).

Output: The recovered plaintext message .

Steps:

m := srd = srkd.

An eavesdropper Carol knows the domain parameters G, g, k and n and also the recipient’s public key gd. Determining the message m from a knowledge of the corresponding ciphertext (r, s) is then equivalent to computing the element gdd. This implies that a (quick) solution of the DHP permits Carol to decrypt a ciphertext. If a (quick) solution of the DLP is available, then the element gdd is computable fast. The reverse implication is, however, not clear: it may be easier to solve the DHP than the DLP, though no concrete evidences are available to corroborate this fact.

5.2.6. The Chor–Rivest Public-key Encryption Algorithm

The Chor–Rivest encryption algorithm is based on a variant of the subset sum problem. It selects a prime p and an integer h ≥ 2, uses a knapsack set A = {a0, . . . , ap–1} with 1aiph – 2 for each i, and considers sums of the form , , with . In order to construct the set A for which the h-fold sum s is uniquely determined by the binary vector (∊0, . . . , ∊p–1) of weight h (that is, with exactly h bits equal to 1), we take the help of the finite field . We represent as , where is irreducible of degree h and where x is the residue class of X in . The parameters p and h must be so chosen that ph –1 is reasonably smooth, so that the integer factorization of ph – 1 can be easily computed. This helps us in two ways. First, a generator g(x) of the multiplicative group can be made available quickly using Algorithm 3.25. Second, the Pohlig–Hellman method of Section 4.4.1 becomes efficient for computing discrete logarithms in . We can then take ai := indg(x)(x + i), i = 0, 1, . . . , p – 1. If (∊0, . . . , ∊p–1) and are two binary vectors of weight h, then implies , that is, , that is, for all i = 0, . . . , p – 1 , since otherwise x would satisfy a non-zero polynomial of degree < h.

Chor–Rivest key pair

A randomly permuted version of a0, . . . , ap–1 shifted by a noise (that is, a random bias) d together with p and h constitute the public key of the Chor–Rivest scheme. The private key, on the other hand, comprises the polynomials f(X) and g(x), the permutation just mentioned and the noise d. Algorithm 5.17 elaborates the generation of such a key pair. The same values of p and h can be used by different entities on a network. So we assume that p and h are provided instead of generated by the recipient as a part of his public key. For brevity, we use the notation q := ph.

Key generation may be a long process in the Chor–Rivest scheme depending on how difficult it is to compute all the indexes indg(x)(x + i). Furthermore, the size of the public key is quite large, namely O(ph log p). Typically one may take p ≈ 200 and h ≈ 25. The original paper of Chor and Rivest [54] recommends the possibilities (197, 24), (211, 24), (243, 24) and (256, 25) for (p, h). Note that 256 is not a prime, but Chor–Rivest’s algorithm works, even when p is a power of a prime. For the sake of simplicity, we here stick to the case that p is a prime.

Algorithm 5.17. Chor–Rivest key generation

Input: A prime p and an integer h ≥ 2 such that ph – 1 is smooth.

Output: A Chor–Rivest key pair.

Steps:

Choose an irreducible polynomial of degree h.

Use the representation , where x := X + 〈f(X)〉.

Choose a random generator g(x) of .

Compute the indexes ai := indg(x)(x + i) for i = 0, 1, . . . , p – 1.

Select a random permutation π of {0, 1, . . . , p – 1}.

Select a random noise d in the range 0 ≤ dq – 2.

Compute αi := aπ(i) + d (mod q – 1) for i = 0, 1, . . . , p – 1.

Return0, α1 . . . , αp–1) as the public key and (f, g, π, d) as the private key.

Chor–Rivest encryption

The Chor–Rivest encryption procedure (Algorithm 5.18) assumes that the input plaintext message is represented as a binary vector (m0, . . . , mp–1) of weight (that is, number of one-bits) equal to h. Since there are such binary vectors, arbitrary binary strings of bit length can be encoded into binary vectors of the above special form. See Chor and Rivest [54] for an algorithm that describes how such an encoding can be done. Chor–Rivest encryption is quite fast, since it computes only h integer additions modulo q – 1.

Algorithm 5.18. Chor–Rivest encryption

Input: The Chor–Rivest public key (α0, . . . , αp–1) (together with p and h) and the plaintext message (m0, . . . , mp–1) which is a binary vector of weight h.

Output: The ciphertext message .

Steps:

(mod q – 1).

Chor–Rivest decryption

The Chor–Rivest decryption procedure (Algorithm 5.19) generates a monic polynomial of degree h, the h (distinct) roots of which gives the non-zero bits mi in the original plaintext message.

In order to prove that the decryption correctly works, note that (mod q – 1) , so that (mod f(X)). The polynomial u(X) is computed as one of degree < h. Adding f(X) to u(X) gives a monic polynomial v(X) of degree h, which is congruent modulo f(X) to . The roots of v(X) can be obtained either by a root finding algorithm or by trial divisions of v(X) by X + i, i = 0, 1, . . . , p – 1. Applying the inverse of π on these roots then reconstructs the plaintext message.

Algorithm 5.19. Chor–Rivest decryption

Input: The Chor–Rivest private key (f, g, π, d) (together with p and h) and the ciphertext message .

Output: The recovered plaintext message (m0, . . . , mp–1) which is a binary vector of weight h.

Steps:

s := chd (mod q – 1).

u(X) := g(X)s (mod f(X)).

v(X) := f(X) + u(X).

Factorize u(X) as u(X) = (X + i1)· · ·(X + ih), .

For i = 0, 1, . . . , p – 1 set

An eavesdropper sees only the sum (mod q – 1) of the (known) knapsack weights α0, . . . , αp–1. In order to recover m0, . . . , mp–1, she should solve the SSP. By choosing p and h carefully, the density of the knapsack set can be adjusted to be high, that is, larger than what the cryptanalytic routines described in Section 4.8 can handle. Thus, the Chor–Rivest scheme is assumed to be secure. However, as discussed in Chor and Rivest [54], the security of the system breaks down, when certain partial information on the private key are available.

*5.2.7. The XTR Public-key Encryption Algorithm

XTR, a phonetic abbreviation of efficient and compact subgroup trace representation, is designed by Arjen Lenstra and Eric Verheul as an attractive alternative to RSA (and similar cryptosystems including the ElGamal scheme over finite fields) and elliptic curve cryptosystems (ECC). The attractiveness of XTR arises from the following facts:

XTR, though not a fundamental breakthrough, deserves treatment in this chapter. The working of XTR is somewhat involved and we plan to present only a conceptual description of the algorithm, hiding the mathematical details.

XTR considers the following tower of field extensions:

where p ≡ 2 (mod 3) is a prime, sufficiently large so that computing discrete logs in using known algorithms is infeasible. We have p6 – 1 = (p – 1)(p + 1)(p2p + 1)(p2 + p + 1). Let q be a prime divisor of p2p + 1 of bit length 160 or more. There is a unique subgroup G of with #G = q. G is called the XTR (sub)group, whereas the entire group is called the XTR supergroup. The XTR group G is cyclic (Lemma 2.1, p 27). Let g be a generator of G, that is, G = 〈g〉 = {1, g, g2, . . . , gq–1}.

The working of XTR is based on the discrete log problem in G. Since p2p + 1 and hence q are relatively prime to the orders of the multiplicative groups of all proper subfields of , computing discrete logs in G is (seemingly) as difficult as that in , that is, one gets the same level of security by the use of G instead of the full XTR supergroup.

The main technical innovation of XTR is the proposal of a compact representation of the elements of G in place of the obvious representation using ⌈6 lg p⌉ bits inherited from that of . This is precisely where the intermediate field comes into picture. We require a map , so that we can represent elements of G by those of . This map offers two benefits. First, the elements of G can now be represented using ⌈2 lg p⌉ bits leading to a three-fold reduction in the key size. Second, the arithmetic of can be exploited to implement the arithmetic in G, thereby improving the efficiency of encryption and decryption routines (compared to those over the full XTR supergroup).

The map uses the traces of elements of over (Definition 2.59). In this section, we use the shorthand notation Tr to stand for . The conjugates of an element over are h, hp2, hp4 and so

Let us now specialize to . Since p2p – 1 (mod p2p + 1) and p4 ≡ –p (mod p2p + 1), the conjugates of h are gn, g(p–1)n, gpn. Thus, Tr(gn) = gn + g(p–1)n + gpn. Moreover,

so the minimal polynomial of h = gn over is

This minimal polynomial is determined uniquely by Tr(gn) and so we can represent by . Note, however, that this representation is not unique, that is, the map , is not injective. More precisely, the only elements of G that map to Tr(gn) are the conjugates gn, g(p–1)n, gpn of gn. This is often not a serious problem, as we see below.

In order to complete the description of the implementation of the arithmetic of the group G, we need to address two further issues. This is necessary, since the trace representation defined above is not a homomorphism of groups. First, we specify how one can implement the arithmetic of . Since p ≡ 2 (mod 3), X2+X+1 is irreducible over . If is a root of X2 + X + 1, we have the standard representation . That is, we can represent . Since 1 + α + α2 = 0, we have y0 + y1α = (–α – α2)y0 + y1α = (y1y0)α + (–y02. This leads to the non-standard representation

Since p ≡ 2 (mod 3) and α3 = 1 + (α – 1)(α2 + α + 1) = 1, the -basis {α, α2} of is the same as the normal basis {α, αp}. Under this basis, the basic arithmetic operations in can be implemented using only a few multiplications (and some additions/subtractions) in , as described in Table 5.1. Here, the operands are x = x1α + x2α2, y = y1α + y2α2 and z = z1α + z2α2.

Table 5.1. Basic operations in
OperationNumber of multiplications
xp0 (since xp = x2α + x1α2.)
x22 (since x2 = x2(x2 – 2x1)α + x1(x1 – 2x22.)
xy3 (since xy = (x2y2x1y2x2y1)α + (x1y1x1y2x2y12, that is, it suffices to compute x1y1, x2y2, (x1 + x2)(y1 + y2).)
xzyzp4 (since xzyzp = (z1(y1x2y2) + z2(x2x1 + y2))α + (z1(x1x2 + y1) + z2(y2x1y1))α2.)

Now, we explain how arithmetic operations in G translate to those in under the representation of by . To start with, we show how the knowledge of Tr(h) and n allows one to compute Tr(hn) for . This corresponds to an exponentiation in G. For , define the polynomial

where h1, h2, are the three roots (not necessarily distinct) of Fc(X). For , we use the notation

Putting c = Tr(g) yields cn = Tr(gn), or, more generally, for c = Tr(gk) we have cn = Tr(gkn). Algorithm 5.20 computes

given (for example, Tr(gk)) and (typically ). The correctness of the algorithm is based on the following identities, the derivations of which are left to the reader (alternatively, see Lenstra and Verheul [170]).

Equation 5.1


Equation 5.2


Equation 5.3


Equation 5.4


Equation 5.5


Equation 5.6


Equation 5.7


Equation 5.8


Algorithm 5.20. XTR exponentiation

Input: and .

Output:.

Steps:

if (n < 0) {
   Compute Sn(c).
   Use Equation (5.3) to compute and return Sn(c).
}
if (n = 0) { Return (cp, 3, c). }
if (n = 1) { Return (3, cc2 – 2cp). }
if (n = 2) {
   Compute S1(cand hence c3 using Equation (5.5).
   Return (c1c2c3).
}
/* Now n ≥ 3 */

/* Initialize */
k := 1.
Compute S2k+1(c) = S3(c) = (c2c3c4from S2(cusing Equation (5.5).
/* Exponentiation loop */
for j = l – 1, l – 2, . . . , 0 {
   if (mj = 0) {
      Compute S4k+1(c) = (c4kc4k+1c4k+2from S2k+1(c) = (c2kc2k+1c2k+2).
      /* Use Equation (5.6) for c4k and c4k+2 and Equation (5.7) for c4k+1 */
   } else {       /* mj = 1 */
      Compute S4k+3(c) = (c4k+2c4k+3c4k+4from S2k+1(c) = (c2kc2k+1c2k+2).
      /* Use Equation (5.6) for c4k+2 and c4k+4 and Equation (5.8) for c4k+3 */
   }
   k := 2k + mj.
}

/* Now k = m and we have computed 

if (n is even) {
   Compute Sn(c) = (cn–1cncn+1from Sn–1(c) = (cn–2cn–1cn).
   /* Use Equation (5.5) to compute cn+1 from Sn–1 */
}

A careful analysis suggests that the computation of cn from c requires 8 lg n multiplications in . An exponentiation in , on the other hand, requires an expected number of 23.4 lg n multiplications in (assuming that, in , the time for squaring is 80 per cent of that of multiplication). Thus, the XTR representation provides a speed-up of about 3.

XTR key pair

The domain parameters for an XTR cryptosystem include primes p and q satisfying the following requirements:

We require a generator g of the XTR group G. Since we planned to replace working in G by working in , the element g is not needed explicitly. The trace Tr(g) suffices for our purpose. Lenstra and Verheul [170, 172] describe several methods for obtaining the domain parameters p, q, Tr(g). We describe here the naivest strategies. Algorithm 5.21 outputs the primes p, q with |p| = lp and |q| = lq for some given lp, .

Algorithm 5.21. Generation of XTR primes

Randomly choose such that q := r2r + 1 is a prime of size |q| = lq.

Randomly choose such that p := r + kq is a prime with |p| = lp and p ≡ 2 (mod 3).

Determination of Tr(g) for a suitable g requires some mathematics. First, notice that if the polynomial is irreducible (over ) for some , then c = Tr(h) for some with ord h|(p2p + 1). Moreover, c(p2p+1)/q, if not equal to 3, is the trace of an element (for example, h(p2p+1)/q) of order q. Thus, we may take Tr(g) = c(p2p+1)/q. Although we do not need it explicitly, the corresponding can be taken to be any root of the polynomial FTr(g)(X).

What remains to explain is how one can find an irreducible . A randomized algorithm results from the fact that for a randomly chosen the polynomial Fc(X) is irreducible with probability ≈ 1/3.

Once the domain parameters of an XTR system are set, the recipient chooses a random and computes Tr(gd) using Algorithm 5.20. The tuple (p, q, Tr(g), Tr(gd)) is the public key and d the private key of the recipient.

XTR encryption

XTR encryption (Algorithm 5.22) is very similar to ElGamal encryption. The only difference is that now we work in under the trace representation of the elements of G, that is, one uses Algorithm 5.20 for computing exponentiations in G.

Algorithm 5.22. XTR encryption

Input: The public key (p, q, Tr(g), Tr(gd)) of the recipient and the message to be encrypted.

Output: The ciphertext message .

Steps:

Generate a random session key .

Compute r := Tr(gd) using Algorithm 5.20 with c := Tr(g) and n := d′.

Compute Tr(gdd) using Algorithm 5.20 with c := Tr(gd) and n := d′.

Set s := m Tr(gdd).

XTR decryption

XTR decryption (Algorithm 5.23) is again analogous to ElGamal decryption except that we have to incorporate the XTR representation of elements of G.

Algorithm 5.23. XTR decryption

Input: The private key d of the recipient and the ciphertext .

Output: The recovered plaintext message m.

Steps:

Compute Tr(gdd) using Algorithm 5.20 with c := r = Tr(gd) and n := d.

Set .

Note that XTR encryption and decryption use Algorithm 5.20 for performing exponentiations. Therefore, these routines run about three times faster than the corresponding ElGamal routines based on the standard arithmetic.

*5.2.8. The NTRU Public-key Encryption Algorithm

Hoffstein et al. [130] have proposed the NTRU encryption scheme in which encryption involves a mixing system using the polynomial algebra and reductions modulo two relatively prime integers α and β. The decryption involves an unmixing system and can be proved to be correct with high probability. The security of this scheme banks on the interaction of the mixing system with the independence of the reductions modulo α and β. Attacks against NTRU based on the determination of short vectors in certain lattices are known. However, suitable choices of the parameters make NTRU resistant to these attacks. The most attractive feature of the NTRU scheme is that encryption and decryption in this case are much faster than those in other known schemes (like RSA, ECC and even XTR).

NTRU key pair

NTRU parameters include three positive integers n, α and β with gcd(α, β) = 1 and with β considerably larger than α (see Table 5.2). Consider the polynomial algebra . An element of is represented as a polynomial f = f0 + f1X + · · · + fn–1Xn–1 or, equivalently, as a vector (f0, f1, . . . , fn–1) of the coefficients. Note that Xn – 1 is not irreducible in (for n ≥ 2) and so R is not a field, but that does not matter for the NTRU scheme. For two polynomials f, g of degree < n and with integer coefficients, we denote by f g the product of f and g in , whereas f and g as elements of R multiplies to fg = h with

Table 5.2. Recommended NTRU parameters
Securitynαβνfνgνu
short-term10736415125
moderate1673128612018
standard[*]2633128502416
high50332562167255

[*] Assumed to be equivalent to 1024-bit RSA

NTRU works with polynomials having small coefficients. More specifically, we define the following subsets of R. The message space (that is, the set of plaintext messages) consists of all polynomials of R with coefficients reduced modulo α. Unlike our representation of so far, we use the integers between –α/2 and +α/2 to represent the coefficients of polynomials in , that is,

For ν1, , we also define the subset

of R. For suitably chosen parameters νf, νg and νu (see Table 5.2), we use the special notations:

With these notations we are now ready to describe the NTRU key generation routine. The subsets , , and are assumed to be public knowledge (along with the parameters n, α and β).

Algorithm 5.24. NTRU key generation

Input: n, α, β and , as defined above.

Output: A random NTRU key pair.

Steps:

Choose and randomly.

/* f must be invertible modulo both α and β */

Compute fα and fβ satisfying fαf ≡ 1 (mod α) and fβf ≡ 1 (mod β).

h := fβg (mod β).

Return h as the public key and f (along with fα) as the private key.

The polynomial fα can be computed from f during decryption. However, for the sake of efficiency, it is recommended that fα be stored along with f.

The integers α and β are either small primes or small powers of small primes (Table 5.2). The most time-consuming step in the NTRU key generation procedure is the computation of the inverses fα and fβ. Suppose we want to compute the inverse of f in , where p is a small prime and e is a small exponent (we may have e = 1). We first compute f(X)–1 in the ring . Since p is a prime, is a field, that is, is a Euclidean domain (Exercise 2.31). We compute the extended Euclidean gcd of f(X) with Xn – 1. If f(X) and Xn – 1 are not coprime modulo p, then f(X) is not invertible in , else we get and s(X) is the inverse of f(X) in . A randomly chosen f(X) with gcd(f(1), p) = 1 has high probability of being invertible modulo p. Recall that we have chosen , so that f(1) = 1.

If e = 1, we have already computed the desired inverse of f(X). If e > 1, we have to lift the inverse fp(X) = u(X) of f(X) modulo p to the inverse fp2 (X) of f(X) modulo p2, and then to the inverse fp3 (X) of f(X) modulo p3, and so on. Eventually, we get the inverse fpe (X) of f(X) modulo pe. Here we describe the generic lift procedure of fpk (X) to fpk+1 (X). In the ring , we have fpkf ≡ 1 (mod pk). We can write fpk+1 (X) = fpk (X) + pka(X) for some . Substituting this value in fpk+1f ≡ 1 (mod pk+1) gives the unknown polynomial a(X) as

where s(X) = fp(X) is the inverse of f modulo p.

It is often recommended that f(X) be taken of the form for some . In this case, fα(X) = 1 is trivially available and need not be computed as mentioned above. Such a choice of f also speeds up NTRU decryption (see Algorithm 5.26) by reducing the number of polynomial multiplications from two to one. The inverse fβ, however, has to be computed (but need not be stored).

NTRU encryption

For NTRU encryption (Algorithm 5.25), the message is encoded to a polynomial in . The costliest step in this algorithm is computing the product uh and can be done in time O(n2). Asymptotically better running time (O(n log n)) is achievable by Algorithm 5.25, if one uses faster polynomial multiplication routines (like those based on fast Fourier transforms). However, for the cryptographic range of values of n, straightforward quadratic multiplication gives better performance. Most other encryption schemes (like RSA) take time O(n3), where n is the size of the modulus. This explains why NTRU encryption is much faster than conventional encryption routines.

Algorithm 5.25. NTRU encryption

Input: (n, α, β and) the NTRU public key h of the recipient and the plaintext message .

Output: The ciphertext c which is a polynomial in R, reduced modulo β.

Steps:

Randomly select .

c := αuh + m (mod β).

NTRU decryption

NTRU decryption (Algorithm 5.26) involves two multiplications in R and runs in time O(n2). In order to prove the correctness of Algorithm 5.26, one needs to verify that v ≡ αug + fm (mod β). With an appropriate choice of the parameters, it can be ensured that almost always the polynomial has coefficients in the interval –β/2 and +β/2. In that case, we have the equality v = αug + fm in R. Multiplication of v by fα and reduction modulo α now clearly retrieves m.

Algorithm 5.26. NTRU decryption

Input: The NTRU private key f (and fα) of the recipient and the ciphertext message c.

Output: The recovered plaintext message .

Steps:

v := fc (mod β).

/* The coefficients of v are chosen to lie between –β/2 and +β/2 */

m := fαv (mod α).

If f is chosen to be of the special form f = 1 + αf1 (for some polynomial f1), then v = αug + αf1m + m. Thus, reduction of v modulo α straightaway gives m, that is, there is no need to multiply v by fα. Also fα (having the trivial value 1) need not be stored in the private key. To sum up, taking f to be of the above special form increases the efficiency of the NTRU scheme without (seemingly) affecting its security. But now f is no longer an element of and some care should be taken to choose suitable values of f.

NTRU decryption fails, usually when m is not properly centred (around 0). In that case, representing v as a polynomial with coefficients in the range –β/2 + x and +β/2 + x for a small positive or negative value of x may result in correct decryption. If, on the other hand, no values of x work, NTRU decryption cannot recover m easily and is said to suffer from a gap failure. For suitable parameter values, gap failures are very unlikely and can be ignored for all practical purposes.

Now, let us see how the NTRU system can be broken. In order to find out the private key f from the public key h = fβg, one may keep on searching for exhaustively, until . Alternatively, one may try all , until . In a similar manner, m can be retrieved from c by trying all , until . Clearly, such an attack takes expected time proportional to the size of or or .

A baby-step–giant-step strategy reduces the running times to the square roots of the sizes of the above sets. For example, suppose we want to compute f from h. We split f = f1 + f2 into two nearly equal pieces f1 and f2. If n is odd, f1 may contain the (n + 1)/2 most significant terms and f2 the (n – 1)/2 least significant terms of f. Now, we compute (f2, –f2h (mod β)) for all possibilities of f2 and store the pairs sorted by the second component. Next, for each possibility of f1 (baby step) we compute f1h (mod β) and see if there is any f2 (giant step) for which f1h (mod β) and –f2h (mod β) have nearly equal values. If a matching pair (f1, f2) is located, we take f = f1 + f2. A similar method works for guessing m from c.

It is necessary to take the sets , and big enough, so that exhaustive or square root attacks are not feasible. Typically, choosing the sizes of these sets to be ≥ 2160 is deemed sufficiently secure.

Another relevant attack is discussed in Exercise 5.11. By far, the most sophisticated attack on the NTRU encryption scheme is based on finding short vectors in a lattice. We describe this attack in connection with the computation of the private key f from a knowledge of the private key h. Let L denote the lattice in generated by the rows of the 2n × 2n matrix:

where h = h0 + h1X + · · · + hn–1Xn–1 = (h0, h1, . . . , hn–1) and where λ is a parameter whose choice is discussed below. Since h = gf–1 (mod β), multiplying the i-th row by fi–1 (i = 1, . . . , n) and adding we conclude that the vector v := (λf0, λf1, . . . , λfn–1, g0, g1, . . . , gn–1) is in L. By tuning the value λ, the attacker maximizes the chance for v to be a short vector in L. However, if the system parameters are appropriately selected, lattice reduction algorithms become rather ineffective in finding v. Heuristic evidences suggest that this attack runs in time exponential in n.

Exercise Set 5.2

5.1Establish the correctness of Algorithm 5.4.
5.2
  1. Assume that the same message m is encrypted using the RSA algorithm and using the public keys (n1, e), . . . , (ne, e) of e entities each of which has the same encryption exponent e. Assume further that the moduli n1, . . . , ne are pairwise coprime. Specify a method by which an adversary can reconstruct the message m from a knowledge of the ciphertext messages c1, . . . , ce. [H]

  2. How can such an attack be prevented? [H]

5.3
  1. Let n, . How many solutions does the polynomial XeX have in ? [H]

  2. In particular, conclude that if n = pq is an RSA modulus and e is the encryption exponent, there exist gcd(e – 1, p – 1) × gcd(e – 1, q – 1) messages m for which mem (mod n). Such messages are often called unconcealed. The number of unconcealed messages for random parameters n and e is, in general, vanishingly low compared to n.

5.4Assume that two parties Bob and Barbara share a common RSA modulus n but relatively prime encryption exponents e1 and e2. Alice encrypts the same message by (n, e1) and (n, e2) and sends the ciphertext messages to Bob and Barbara respectively. Suppose also that Carol intercepts both the ciphertexts. Describe a method by which Carol retrieves the (common) plaintext. [H]
5.5Let n = pq be a Rabin public key and let be a quadratic residue modulo n. Show that the knowledge of the four square roots of c modulo n breaks the Rabin system.
5.6What is the disadvantage of using the same session key in the ElGamal encryption scheme for encrypting two different messages (for the same recipient)? [H]
5.7Let p be an odd prime and g a generator of .
  1. Show that the set S := {g2i | i = 0, 1, . . . , (p – 3)/2} is precisely the set of all quadratic residues modulo p. Show also that S is a subgroup of .

  2. Assume that ygx (mod p) for some . Show that the least significant bit of x is 0 or 1 according as whether y(p–1)/2 is congruent to 1 or –1 modulo p respectively. Thus, it is easy to determine from y the least significant bit of the discrete logarithm x = indg y.

  3. Assume that p ≡ 3 (mod 4) and that p, g, y are only known (but x is not known). Suppose further that there is an oracle (a black box) that, given , returns the second least significant bit of indg z. Show that x = indg y can be easily computed by making a polynomial (in log p) number of calls to this oracle. [H]

5.8Show that if the private-key parameters f(X) and d are known to a cryptanalyst of the Chor–Rivest scheme, she can recover the other parts of the private key and thus break the system completely. [H]
5.9Show that if f(X) is only known to a cryptanalyst of the Chor–Rivest scheme, then also she can recover the full private key. [H]
5.10
  1. Derive the identities of Equations (5.1) through (5.8) (p 325).

  2. With the notations of Section 5.2.7 deduce that:

    c3=c3 – 3cp+1 + 3.
    c4=c4 – 4cp+2 + 2c2p + 4c.

5.11In this exercise, we use the notations of Section 5.2.8. Assume that Alice encrypts the same message m several times using the NTRU public key h of Bob, but with different random polynomials , i = 1, . . . , r, and sends the corresponding ciphertext messages c1, . . . , cr. Describe a strategy how an eavesdropper Carol can recover a considerable part of u1. [H] Trying all the possibilities for the (relatively small) unknown part of u1 allows Carol to retrieve m with little effort.

5.3. Key Exchange

Consider the scenario wherein two parties Alice and Bob want to share a secret information (say, a DES key for future correspondence), but it is not possible to communicate this secret by personal contact or by conversing over a secure channel. In other words, Alice and Bob want to arrive at a common secret value by communicating over a public (and hence insecure) channel. A key-exchange or a key-agreement protocol allows Alice and Bob to do so. The protocol should be such that an eavesdropper listening to the conversation between Alice and Bob cannot compute the secret value in feasible time.

Public-key technology is used to design a key-exchange protocol in the following way. Alice generates a key pair (eA, dA) and sends the public key eA to Bob. Similarly, Bob generates a random key pair (eB, dB) and sends the public key eB to Alice. Now, Alice and Bob respectively compute the values sA = f(eB, dA) and sB = f(eA, dB) using their respective knowledges, where f is a suitably chosen function. If sA = sB, then this value can be used as the shared secret between Alice and Bob. The intruder Carol can intercept eA and eB, but f should be such that a knowledge of eA and eB alone does not allow Carol to compute sA = sB. She needs dA or dB for this computation. Since (eA, dA) and (eB, dB) are key pairs, we assume that it is infeasible to compute dA from eA or dB from eB.

In what follows, we describe some key-exchange protocols. The security of these protocols is dependent on the intractability of the DHP (or the DLP). We provide a generic description, where we work in a finite Abelian multiplicative group G of order n. We write the identity of G as 1. G need not be cyclic, but we assume that an element having suitably large (and preferably prime) multiplicative order m is provided. G, g, n and m may be made publicly available, but G should be a group in which one cannot compute discrete logarithms in feasible time. Typical examples of G are given in Section 5.2.5.

5.3.1. Basic Key-Exchange Protocols

Basic key-exchange protocols provide provable security against passive attacks under the intractability of the DHP. However, several models of active attacks are known for the basic protocols. One requires authentication (validation of the public keys) to eliminate these attacks.

The Diffie–Hellman key-exchange protocol

The Diffie–Hellman (DH) key-exchange algorithm [78] is one of the pioneering discoveries leading to the birth of public-key cryptography.

Algorithm 5.27. Diffie–Hellman key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes eA := gdA.

Alice sends eA to Bob.

Bob generates a random and computes eB := gdB.

Bob sends eB to Alice.

Alice computes s := (eB)dA = gdAdB.

Bob computes s := (eA)dB = gdAdB.

if (s = 1) { Return “failure”. }

The DH scheme fails, if the shared secret turns out to be a trivial element (like the identity) of G. In that case, Alice and Bob should re-execute the protocol with different key pairs. The probability of such an incident is, however, extremely low.

The intruder Carol learns the group elements gdA and gdB by listening to the conversation between Alice and Bob and intends to compute s = gdAdB. Thus, she has to solve an instance of the DHP in the group G. By assumption, this is computationally infeasible. This is how the DH scheme derives its security.

Small-subgroup attacks

A small-subgroup attack on the DH protocol can be mounted by an active adversary. Assume that the order m of g in G is composite and has known factorization m = uv with u small. Carol intercepts the messages between Alice and Bob, replaces them by their respective v-th powers and retransmits the modified messages.

Algorithm 5.28. A small-subgroup attack by an active eavesdropper

Alice generates a random and computes eA := gdA.

Alice transmits eA for Bob.

Carol intercepts eA, computes and sends to Bob.

Bob generates a random and computes eB := gdB.

Bob transmits eB for Alice.

Carol intercepts eB, computes and sends to Alice.

Alice computes .

Bob computes .

if (s′ = 1) { Return “failure”. }

But ord g = uv and so (s′)u = 1, that is, s′ has only u – 1 non-trivial values. Since u is small, the possibilities for s′ can be exhaustively searched by Carol. The best countermeasure against this attack is to take m to be a prime (of bit length ≥ 160).

Even when m is prime, it may be the case that the cofactor k := n/m has a small divisor u and it is possible that an active attacker intervenes in such a way that Alice and Bob agree upon a secret value of order (equal to or dividing) u. For example, Carol may replace both the transmitted public keys by an element h of order u. If dA and dB are congruent modulo u, the shared secret has only a few possible values and Carol can obtain the correct value by exhaustive search. On the other hand, if dAdB (mod u), Alice and Bob do not come up with the same secret. However, if Alice uses her secret to encrypt a message for Bob, it remains easy for Carol to decrypt the intercepted ciphertext by trying only a few choices for Alice’s key. Alice and Bob can prevent this attack by refusing to accept as the shared secret not only the trivial value s = 1 but also elements of small orders.

A small-subgroup attack can also be mounted by one of the communicating parties (say, Bob) in an attempt to gain information about the other’s (Alice’s) secret dA. Let us continue to assume that the cofactor k := n/m has a small divisor u. Bob finds an element h in G of order u. Instead of eB = gdB Bob now sends h to Alice. Alice computes the shared secret as . Bob, on the other hand, can normally compute sB := gdAdB. Now, suppose that Alice uses a symmetric cipher with the key (or some part of it) and sends the ciphertext to Bob. In order to decrypt, Bob tries all of the u possible keys sBhj for j = 0, 1, . . . , u – 1. The value of j for which decryption succeeds equals dA modulo u. A similar attack can be mounted by Bob, when is chosen to be an element (like h itself) of order u.

If G is cyclic and H is the subgroup generated by g, then an element is in H if and only if am = 1 (Proposition 2.5, p 27). Moreover, if gcd(k, m) = 1, each communicating party can check the validity of the other party’s public key by using an m-th power exponentiation. An element like h or h of the last paragraph does not pass this test. If so, Alice should abandon the protocol. However, the validation of the public key requires a modular exponentiation and thereby slows down the protocol.

Cofactor exponentiation

We now present an efficient modification of the basic Diffie–Hellman scheme that prevents small-subgroup attacks (by a communicating party or an eavesdropper) without calculating an extra exponentiation. We continue with the notation k := n/m and assume that k is coprime to m. Now, the shared secret is computed as gdAdB or gkdAdB depending on whether compatibility with the original DH scheme is desired or not. Algorithm 5.29 describes the modified DH algorithm. Solve Exercise 5.12 in order to establish the effectiveness of this algorithm against small-subgroup attacks.

5.3.2. Authenticated Key-Exchange Protocols

Other active attack models on the (basic or modified) DH protocol can be conceived of. One important class of attacks is now described.

Unknown key-share attacks

An unknown key-share attack on a key-exchange protocol makes a party believe that (s)he shares a secret with another party, whereas the secret is actually shared by a third party. Assume that Carol can monitor and modify every message between Alice and Bob. When Alice and Bob execute Algorithm 5.27 or 5.29, Carol can intervene and pretend to Alice that she is Bob and to Bob that she is Alice. At the end of the protocol, Alice and Carol come up with a shared secret sAC, and Bob and Carol with another shared secret sBC. Alice believes that she shares sAC with Bob, and Bob believes that he shares sBC with Alice.

Algorithm 5.29. Diffie–Hellman key exchange with cofactor exponentiation

Input: G, g, n, m and k as defined above and a flag indicating compatibility with the original DH scheme.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice generates a random and computes eA := gdA.

Alice sends eA to Bob.

Bob generates a random and computes eB := gdB.

Bob sends eB to Alice.

if (compatibility with the original DH algorithm is desired) {
   Alice assigns δA := k–1dA (mod m).
   Bob assigns δB := k–1dB (mod m).
else {
   Alice assigns δA := dA (mod m).
   Bob assigns δB := dB (mod m).
}
Alice computes s := (eB)kδA.
Bob computes s := (eA)kδB.
if (s = 1) { Return “failure”. }

Now, when Alice wants to send a secret message m to Bob, she encrypts m by sAC and transmits the ciphertext c. Carol intercepts c, decrypts it by sAC to retrieve m, encrypts m by sBC and sends the new ciphertext c′ to Bob. Bob retrieves m by decrypting c′ with his key sBC. The process raises hardly any suspicion in Alice or Bob about the existence of the mediating third party.

In order to avoid this attack, Alice and Bob should each validate the authenticity of the public key of the other party. Public-key certificates can be used to this effect. Unfortunately, using certificates alone may fail to eliminate unknown key-share attacks, as Algorithm 5.30 shows. At the end of this protocol Alice and Bob share a secret s, but Bob believes that he shares it with (the intruder) Carol. Here Carol herself cannot compute the shared secret s (provided that computing discrete logs in G is infeasible). Still there may be situations where this attack can be exploited (see Law et al. [161] for a hypothetical example).

This attack has two potential problems. Under the assumption of intractability of the DLP in G, Carol cannot compute the private key corresponding to the public key eC and so her getting the certificate CertC knowing eC alone may be questioned. Furthermore, replacing (eB, CertB) to ((eB)d, CertB) may make the certificate invalid. If we assume that a certificate authenticates only the entity and not the public key, then these objections can be overruled. In practice, however, a public key certificate should bind the public key to an entity (who can prove the knowledge of the corresponding private key) and so the above attack cannot be easily mounted. Nonetheless, the need for stronger authenticated key-exchange protocols is highlighted by the attack.

Algorithm 5.30. An unknown key-share attack

Alice generates a random and computes eA := gdA.

Alice gets the certificate CertA on eA from the certifying authority.

Alice transmits (eA, CertA) for Bob.

Carol intercepts (eA, CertA).

Carol chooses a random .

Carol gets the certificate CertC on eC := (eA)d from the certifying authority.

Carol sends (eC, CertC) to Bob.

Bob generates a random and computes eB := gdB.

Bob gets the certificate CertB on eB from the certifying authority.

Bob sends (eB, CertB) to Carol.

Carol transmits ((eB)d, CertB) to Alice.

Alice computes s = ((eB)d)dA = gddAdB.

Bob computes s = (eC)dB = ((eA)d)dB = gddAdB.

The Menezes–Qu–Vanstone key-exchange protocol

The Menezes–Qu–Vanstone (MQV) key-exchange protocol is an improved extension of the basic DH scheme, that incorporates public-key authentication. Though the achievement of the desired security goals by the MQV protocol does not seem to be provable, heuristic arguments suggest the effectiveness of the protocol against active adversaries.

Once again, let Alice and Bob be the two parties who plan to agree on a secret element , where the domain parameters G, g, n and m are chosen as in the basic DH scheme. In the MQV scheme, each entity uses two key pairs, one of which ((EA, DA) for Alice and (EB, DB) for Bob) is called the static or the long-term key pair, whereas the other ((eA, dA) for Alice and (eB, dB) for Bob) is called the ephemeral or the short-term key pair. The static key is bound to an entity for a certain period of time and is used in every invocation of the MQV protocol during that period. On the other hand, each entity generates and uses a new ephemeral key pair during each invocation of the protocol. The static key of an entity is assumed to be authentic, say, certified by a trusted authority. The ephemeral key, on the other hand, is validated using the static private key.

Assume that there is a (publicly known) function . Let l := ⌊lg m⌋ + 1 denote the bit length of m = ord g. For , let denote the integer . The bit size of is about half of that of m. In particular, (mod m) for all .

In the MQV protocol, Alice and Bob each computes the shared secret s = gσAσB, where and . Here the exponents σA and σB bear the implicit signatures of Alice and Bob, impressed by their respective static private keys. Alice can compute , since she knows the static public key EB and the ephemeral public key eB of Bob. Similarly, Bob can compute from a knowledge of the public keys EA and eA of Alice. We summarize the steps in Algorithm 5.31.

Algorithm 5.31. MQV key exchange

Input: G, g, n and m as defined above.

Output: A secret element to be shared by Alice and Bob.

Steps:

Alice obtains Bob’s static public key EB.

Bob obtains Alice’s static public key EA.

Alice generates a random integer dA, 2 ≤ dAm – 1, and computes eA := gdA.

Alice sends eA to Bob.

Bob generates a random integer dB, 2 ≤ dBm– 1, and computes eB := gdB.

Bob sends eB to Alice.

Alice computes (mod m).

Alice computes .

Bob computes (mod m).

Bob computes .

if (s = 1) { Return “failure”. }

Each participating entity using the MQV protocol performs three exponentiations in G. Alice computes gdA, and , of which the first and the last ones have exponents O(m). On the other hand, is , so that the middle exponentiation is about twice as fast as a full exponentiation. This performance benefit justifies the use of and instead of eA and eB themselves. It appears that using these half-sized exponents does not affect security. Also note that (mod m), which implies a non-zero contribution of the static key DA in the expression σA. Similarly for σB.

In order to guard against small-subgroup attacks, the MQV algorithm can incorporate the cofactor k := n/m, that is, assuming gcd(k, m) = 1, the shared secret would now be gσAσB or gkσAσB, depending on whether compatibility with the original MQV method is desired or not.

The MQV algorithm can be used in a situation when only one party, say, Alice, is capable of initiating a transmission to the other party (Bob). In that case, Bob’s static key pair is used also as his ephemeral key pair, that is, the secret element shared between Alice and Bob is .

See Raymond and Stiglic [250] to know more about the security issues for the DH key agreement protocol and its variants.

Exercise Set 5.3

5.12Let G be a multiplicative Abelian group of order n and with identity 1, H the subgroup of G generated by an element of order n, k := n/m and gcd(k, m) = 1. Further let a be a non-identity element of G.
  1. Prove that if ak = 1, then aH. (The converse of this statement is not true in general, even when G is cyclic. However, if a is an element of small order dividing k, we obviously have ak = 1.)

  2. Explain how the modified Diffie–Hellman protocol (Algorithm 5.29) prevents an active attack by Bob described in connection with small-subgroup attacks.

5.13Write the MQV key-exchange protocol with cofactor exponentiation.
5.14Provide the details of the Diffie–Hellman key-exchange algorithm based on the XTR representation (Section 5.2.7).

5.4. Digital Signatures

Suppose an entity (Alice) is required to be bound to some electronic data (like messages or documents or keys). This binding is achieved by Alice digitally signing the data in such a way that no party other than Alice would be able to generate the signature. The signature should also be such that any entity can easily verify that it was Alice who generated the signature. Digital signatures can be realized using public-key techniques. The entity (Alice) generating a digital signature is called the signer, whereas anybody who wants to verify a signature is called a verifier.

We have seen in Section 5.2 how the encryption and decryption transforms fe, fd achieve confidentiality of sensitive data. If the set of all possible plaintext messages is the same as the set of all ciphertext messages and if fe and fd are bijective maps on that set, then the sequence of encryption and decryption can be reversed in order to realize a digital signature scheme. In order to sign m, Alice uses her private key d and the transform fd to generate s = fd(m, d). Any party who knows the corresponding public key e can recover m as m = fe(s, e). This is broadly how a signature scheme works. Depending on how the representative m is generated from the message M that Alice wants to sign, signature schemes can be classified in two categories.

Signature scheme with message recovery

In this case, one takes m = M. Verification involves getting back the message M. If M is assumed to be (the encoded version of) some human-readable text, then the recovered M = fe(s, e) will also be human-readable. If s is forged, that is, if a private key d′ ≠ d has been used to generate s′ = fd(m, d′), then verification using Alice’s public key yields m′ = fe(s′, e), and typically m′ ≠ m, since d′ and e are not matching keys. The resulting message m′ will, in general, make little or no sense to a human reader. If m is not a human-readable text, one adds some redundancy to it before signing. A forged signature yields m′ during verification, which, with high probability, is expected not to have this redundancy.

Attractive as it looks, it is not suitable if M is a long message. In that case, it is customary to break M into smaller pieces and sign each piece separately. Since public-key operations are slow, signature generation (and also verification) will be time-consuming, if there are too many pieces to sign (and verify). This difficulty is overcome using the second scheme described now.

Signature scheme with appendix

In this scheme, a short representative m = H(M) of M is first computed.[2] The function H is usually chosen to be a hash function, that is, one which converts bit strings of arbitrary length to bit strings of a fixed length. H is assumed to be a public knowledge, that is, anybody who knows M can compute m. We also assume that H(M) can be computed fast for messages M of practical sizes. Alice uses the decryption transform on m to generate s = fd(m, d). The signature now becomes the pair (M, s). A verifier obtains Alice’s public key e and checks if H(M) = fe(s, e). The signature is taken to be valid if and only if equality holds. If a forger uses a private key d′ ≠ d, she generates a signature (M, s′), s′ = fd(m, d′), on M and a verifier expects with high probability the inequality H(M) ≠ fe(s′, e).

[2] If M is already a short message, one may go for taking m = M. In order to promote uniform treatment, we assume that the function H is always applied for the generation of m. Use of H is also desirable from the point of security considerations (Exercise 5.15).

A kind of forgery is possible on signature schemes with appendix. Assume that Alice creates a valid signature (M, s), s = fd(H(M), d), on a message M. The function H is certainly not injective, since its input space is much bigger (infinite) than its output space (finite). Suppose that Carol finds a message M′ ≠ M with H(M′) = H(M). In that case, the pair (M′, s) is a valid signature of Alice on the message M′, though it is not Alice who has generated it. (Indeed it has been generated without the knowledge of the private key d of Alice.) In order to foil such attacks, the function H should have second pre-image resistance. The first pre-image resistance and collision resistance properties of a hash functions also turn out to be important in the context of digital signatures. See Sections 1.2.6 and A.4 to know about hash functions.

We now describe some specific algorithms for (generating and verifying) digital signatures. Key pairs used for these algorithms are usually identical to those used for encryption algorithms of Section 5.2 and, therefore, we refrain from a duplicate description of the key-generation procedures. We focus our discussion only on signature schemes with appendix.

5.4.1. The RSA Digital Signature Algorithm

As in the RSA encryption scheme of Section 5.2.1, each entity generates an RSA modulus n = pq, which is the product of two distinct large primes p and q. A key pair consists of an encryption exponent e (the public key) and a decryption exponent d (the private key) satisfying ed ≡ 1 (mod φ(n)).

RSA signature generation involves a modular exponentiation in the ring .

Algorithm 5.32. RSA signature generation

Input: A message M to be signed and the signer’s private key (n, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).   /*  is the short representative of M */
s := md (mod n).

Signature generation can be speeded up if the parameters p, q, d1 := d rem (p – 1), d2 := d rem (q – 1) and h := q–1 (mod p) are stored (secretly) in the private key. Now, one can use Algorithm 5.4 for signature generation.

The verification routine also involves a modular exponentiation in .

Algorithm 5.33. RSA signature verification

Input: A signature (M, s) and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

m := H(M).   /*  is the short representative of M */
 (mod n).
if  { Return “Signature verified”. }
else { Return “Signature not verified”. }

Small values of e speed up RSA signature verification and are not known to make the scheme suffer from some special attacks. So the values of e like 3, 257 and 65,537 are quite recommended.

5.4.2. The Rabin Digital Signature Algorithm

As in the Rabin encryption algorithm, we choose two distinct large primes p and q of nearly equal sizes and take n = pq. The public key is n, whereas the private key is the pair (p, q). The Rabin signature scheme is based on the intractability of computing square roots modulo n in absence of the knowledge of the prime factors p and q of n.

Rabin signature generation involves finding a quadratic residue m modulo n as a representative of the message M and computing a square root of m modulo n.

Algorithm 5.34. Rabin signature generation

Input: A message M to be signed and the signer’s private key (p, q).

Output: The signature (M, s) on M.

Steps:

m := H(M).          /*  is assumed to be a quadratic residue modulo n */

Compute a square root s1 of m modulo p./* Algorithm 3.17 */
Compute a square root s2 of m modulo q./* Algorithm 3.17 */
Compute satisfying ss1 (mod p) and ss2 (mod q)./* CRT */

Verification (Algorithm 5.35) involves a square operation in .

Algorithm 5.35. Rabin signature verification

Input: A signature (M, s) and the signer’s public key n.

Output: Verification status of the signature.

Steps:

m := H(M)./* is a quadratic residue modulo n */

(mod n).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.3. The ElGamal Digital Signature Algorithm

The ElGamal signature algorithm is based on the intractability of computing discrete logarithms in certain groups G. For a general description, we consider an arbitrary (finite Abelian multiplicative) group G of order n. We assume that G is cyclic and that a generator g of G is provided. A key pair is obtained by selecting a random integer (the private key) d, 2 ≤ dn – 1, and then computing gd (the public key). The hash function H is assumed to convert arbitrary bit strings to elements of . We further assume that the elements of G can be identified as bit strings (on which the hash function H can be directly applied). G (together with its representation), g and n are considered to be public knowledge and are not input to the signature generation and verification routines.

ElGamal signatures are generated as in Algorithm 5.36. The appendix consists of a pair .

Algorithm 5.36. ElGamal signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ n – 1.

s := gd.

t := d′–1 (H(M) – dH(s)) (mod n).

The costliest step in the ElGamal signature generation algorithm is the exponentiation gd. Here, G is assumed to be cyclic and the exponent d′ to be O(n). We will shortly see modifications of the ElGamal scheme in which the exponent can be chosen to be much smaller, namely O(r), where r is a suitably large (prime) divisor of n.

In order to forge a signature, Carol can generate a random session key (d′, gd) and obtain s. For the computation of t, she requires the private key d of the signer. Conversely, if t (and d′) are available to Carol, she can easily compute the private key d. Thus, forging an ElGamal signature is equivalent to solving the DLP in G.

Each invocation of the ElGamal signature generation algorithm must use a new session key (d′, gd). If the same session key (d′, gd) is used to generate the signatures (M1, s1, t1) and (M2, s2, t2) on two different messages M1 and M2, then we have (t1t2)d′ ≡ H(M1) – H(M2) (mod n), whence d′ can be computed, provided that gcd(t1t2, n) = 1. If d′ is known, the private key d can be easily computed (see Exercise 5.6 for a similar situation).

ElGamal signature verification is described in Algorithm 5.37. This is based on the observation that for a (valid) ElGamal signature (M, s, t) on a message M we have . This verification calls for three exponentiations in G to full-size exponents. Working in a suitable (cyclic) subgroup of G makes the algorithm more efficient.

Algorithm 5.37. ElGamal signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

a1 := gH(M).

a2 := (gd)H(s)st.

if (a1 = a2) { Return “Signature verified”. }

else { Return “Signature not verified”. }

ElGamal signatures use a congruence of the form AdB + dC (mod n), and verification is done by checking the equality gA = (gd)BsC. Our choice for A, B and C was A = H(M), B = H(s) and C = t. Indeed, any permutation of H(M), H(s) and t are acceptable as A, B, C. These give rise to several variants of the ElGamal scheme. It is also allowed to take as A, B, C any permutation of H(M)H(s), t, 1 or H(M)H(s), H(M)t, 1 or H(M)H(s), H(s)t, 1 or H(M)t, H(s)t, 1. Permutations of H(M)H(t), H(s), 1 or H(M), H(s)t, 1, on the other hand, are known to have security bugs. For any allowed combination of A, B, C, the choices ±A, ±B, ±C are also valid. For some other variants, see Horster et al. [132].

5.4.4. The Schnorr Digital Signature Algorithm

The Schnorr signature scheme is a modification of the ElGamal scheme and is faster than the ElGamal scheme, since it works in a subgroup of G generated by g of small order. We assume that r := ord g is a prime (though it suffices to have ord g possessing a suitably large prime divisor). We suppose further that the elements of G are represented as bit strings and that we have a hash function H that maps bit strings to elements of . A key pair now consists of an integer d (the private key), 2 ≤ dr – 1, and the element gd (the public key).

Schnorr signature generation is described in Algorithm 5.38.

Algorithm 5.38. Schnorr signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, gd), 2 ≤ d′ ≤ r – 1.

s := H(Mgd)./* Here ‖ denotes string concatenation */
t := d′ – ds (mod r). 

Similar to the ElGamal scheme, the most time-consuming step in this routine is the computation of the session public key gd. But now d′ < r and, therefore, Algorithm 5.38 runs faster than Algorithm 5.36. One can easily check that forging a signature of Alice is computationally equivalent to determining Alice’s private key d from her public key gd. The importance of using a new session key pair in each run of Algorithm 5.38 is exactly the same as in the case of ElGamal signatures.

The verification of Schnorr signatures (Algorithm 5.39) is based upon the fact that gt = gd(gd)s. Thus, the knowledge of g, s, t and gd allows one to compute gd and subsequently H(Mgd). The algorithm involves two exponentiations with both the exponents (t and s) being ≤ r. Thus, signature verification is also faster in the Schnorr scheme than in the ElGamal scheme.

Algorithm 5.39. Schnorr signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

u := gt(gd)s.

.

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.5. The Nyberg–Rueppel Digital Signature Algorithm

The Nyberg–Rueppel (NR) signature algorithm is another adaptation of the ElGamal signature scheme and is based on the intractability of solving the DLP in a group G. We assume that ord G = n has a large prime divisor r and that an element of order r is available. Here, a key pair is of the form (d, gd), where the private key d is an integer between 2 and r – 1 (both inclusive) and where the public key gd is an element of 〈g〉. The hash function H converts bit strings to elements of . We also assume the existence of a (publicly known) function .

NR signature generation can be performed as in Algorithm 5.40.

Algorithm 5.40. Nyberg–Rueppel signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, gd), 2 ≤ d′ ≤ r – 1.

s := H(M) + F(gd) (mod r).

t := d′ – ds (mod r).

The only difference between NR signature generation and Schnorr signature generation is the way how s is computed. Therefore, whatever we remarked in connection with the security and the efficiency of the Schnorr scheme applies equally well to the NR scheme. Signature verification is also very analogous, as Algorithm 5.41 explains.

Algorithm 5.41. Nyberg–Rueppel signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

u := gt(gd)s.

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

5.4.6. The Digital Signature Algorithm (DSA)

The digital signature algorithm (DSA) has been proposed as a standard by the US National Institute of Standards and Technology (NIST) and later accepted as a Federal Information Processing Standard (FIPS) by the US government. This standard is also known as the digital signature standard (DSS). See the NIST document [220] for a complete description of this standard.

Algorithm 5.42. Generation of DSA primes

Input: An integer λ, 0 ≤ λ ≤ 8.

Output: A prime p of bit length l := 512+64λ such that p – 1 has a prime divisor r of length 160 bits.

Steps:

Let l – 1 = 160n + b, 0 ≤ b < 160.     /* n = (l–1) quot 160, b = (l–1) rem 160. */
while (1) {
   do {
       Choose a random seed σ which is a bit string of length k ≥ 160.
       Compute the bit string u := H(σ) ⊕ H((σ + 1) rem 2k).
       r := u OR 2159 OR 1.    /* Set the most and least significant bits of u */
   } while (r is not a prime).
   i := 0, f := 2.
   while (i < 4096) {
       for j = 0, 1, . . . , n { vj := H((σ + f + j) rem 2k). }
       v := v0 + v12160 + · · · + vn–12160(n–1) + (vn rem 2b)2160n + 2l–1.
                                                     /* v is an integer of bit length exactly l */
       p := v – (v rem 2r) + 1.   /* p – 1 is a multiple of 2r */
       if (p is prime) { Return (pr). }
       i++, f := f + n + 1.
   }
}

DSA is based on the intractability of the DLP in the finite field , where p is a prime of bit length 512+64λ with 0 ≤ λ ≤ 8. The cardinality p–1 of is required to have a prime divisor r of length (exactly) 160 bits. The NIST document [220] specifies a standard method for obtaining such a field , which we describe in Algorithm 5.42. We denote by H the SHA-1 hash function that converts bit strings of arbitrary length to bit strings of length 160. We will identify (often without explicit mention) the bit string a1a2. . . ak of length k with the integer a12k–1 + a22k–2 + · · · + ak–12 + ak.

The DSA prime generation procedure (Algorithm 5.42) starts by selecting the prime divisor r and then tries to find a prime p such that r|(p–1). The outputs of H are utilized as pseudorandomly generated bit strings of length 160.

Once the DSA parameters p and r are available, an element of multiplicative order r can be computed by Algorithm 3.26. Henceforth we assume that p, r and g are public knowledge and need not be supplied as inputs to the signature generation and verification routines. A DSA key pair consists of an integer (the private key) d, 2 ≤ dr – 1, and the element gd (the public key) of .

The DSA signature-generation procedure is given as Algorithm 5.43. One may additionally include a check whether s = 0 or t = 0, and, if so, one should repeat signature generation with another session key. But this, being an extremely rare phenomenon, can be ignored for all practical purposes. Both s and t are elements of and hence are represented as integers between 0 and r – 1.

Algorithm 5.43. DSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key d′, 2 ≤ d′ ≤ r – 1.

t := d′–1(H(M) + ds) (mod r).

DSA signature verification is described in Algorithm 5.44. For a valid signature (M, s, t) on a message M, the algorithm computes wd′(H(M) + ds)–1 (mod r), w1H(M)w (mod r) and w2sw (mod r). Therefore, gw1 (gd)w2gw1+dw2gw(H(M)+ds)gd′(H(M)+ds)–1 (H(M)+ds)gd (mod p). Reduction modulo r now gives .

Algorithm 5.44. DSA signature verification

Input: A signature (M, s, t) and the signer’s public key gd.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t–1 (mod r).

w1 := H(M)w (mod r).

w2 := sw (mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

DSA signature generation performs a single exponentiation and DSA verification does two exponentiations modulo p. All the exponents are positive and ≤ r. Thus, DSA is essentially as fast as the Schnorr scheme or the NR scheme.

*5.4.7. The Elliptic Curve Digital Signature Algorithm (ECDSA)

The ECDSA is the elliptic curve analog of the DSA. Algorithm 5.45 describes the generation of the domain parameters necessary to set up an ECDSA system. One first selects a suitable finite field and takes a random elliptic curve E over . E must be such that the cardinality n of the group has a suitably large prime divisor r. One generates a random point of order r and works in the subgroup 〈P〉 of generated by P. It is assumed that q is either a prime p or a power 2m of 2.

Algorithm 5.45. Generation of ECDSA parameters

Input: A finite field , where q is a prime p or a power 2m of 2.

Output: A set of parameters E, n, r, P for the ECDSA.

Steps:

while (1) {
  Choose a randomly.
  Consider the curve .
  Compute n := ord .
  if (n has a prime divisor r > max(2160)) {
     if (n  (qk – 1) for k = 1, . . . , 20) and (n ≠ q) {
        do {
          Select  randomly.
          P := (n/r)P′.
        } while ().
     }
  }
}

The order n = ord can be computed using the SEA algorithm (for q = p) or the Satoh–FGH algorithm (for q = 2m) described in Section 3.6. The integer n should be factored to check if it has a prime divisor r > max(2160, ). The condition n  (qk – 1) for small values of k is necessary to avoid the MOV attack, whereas the condition nq ensures that the SmartASS attack cannot be mounted. is not necessarily a cyclic group. But, r being a prime, a point must be one of order r.

An ECDSA key pair consists of a private key d (an integer in the range 2 ≤ dr – 1) and the corresponding public key . H denotes the hash function SHA-1 that converts bit strings of arbitrary length to bit strings of length 160. As discussed in connection with DSA, we identify bit strings with integers. We also make an association of elements of with integers in the set {0, 1, . . . , q – 1}. ECDSA signatures can be generated as in Algorithm 5.46. It is necessary to check the conditions s ≠ 0 and t ≠ 0. If these conditions are not both satisfied, one should re-run the procedure with a new session key pair.

Algorithm 5.46. ECDSA signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M.

Steps:

Generate a random session key pair (d′, dP), 2 ≤ d′ ≤ r – 1.

/* Let us denote */

s := h (mod r).

t := d–1 (H(M) + ds) (mod r).

ECDSA signature verification is explained in Algorithm 5.47. The correctness of this algorithm can be proved like that of Algorithm 5.44.

Algorithm 5.47. ECDSA signature verification

Input: A signature (M, s, t) and the signer’s public key dP.

Output: Verification status of the signature.

Steps:

if ( or ) { Return “Signature not verified”. }

w := t–1 (mod r).

w1 := H(M)w (mod r).

w2 := sw (mod r).

Q := w1P + w2(dP).

if () { Return “Signature not verified”. }

/* Otherwise denote */

(mod r).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.8. The XTR Signature Algorithm

As discussed in Section 5.2.7, the XTR family of algorithms is an adaptation of other conventional algorithms over finite fields. XTR achieves a speed-up of about three using a clever way of representing elements in certain finite fields. It is no surprise that the DLP-based signature algorithms, described so far, can be given efficient XTR renderings. We explain here XTR–DSA, the XTR version of the digital signature algorithm.

In order to set up an XTR system, we need a prime p ≡ 2 (mod 3). The XTR group G is a subgroup of the multiplicative group and has a prime order q dividing p2p + 1. For compliance with the original version of DSA, one requires q to be of bit length 160. The trace map taking is used to represent an element by the element . Under this representation, arithmetic in G translates to that in . For example, we have seen how exponentiation in G can be efficiently implemented using arithmetic (Algorithm 5.20). The trace Tr(g) of a generator g of G should also be made available for setting up the XTR domain parameters. In Section 5.2.7, we have discussed how a random set of XTR parameters (p, q, Tr(g)) can be computed.

An XTR key comprises a random integer (the private key) and the trace (the public key). Algorithm 5.20 is used to compute Tr(gd) from Tr(g) and d. This algorithm gives Tr(gd–1) and Tr(gd+1) as by-products. For an implementation of XTR–DSA, we require these two elements of . So we assume that the public key consists of the three traces Sd(Tr(g)) = (Tr(gd–1), Tr(gd), . As explained in Lenstra and Verheul [172], the values Tr(gd–1) and Tr(gd+1) can be computed easily from Tr(gd) even when d is unknown, so it suffices to store only Tr(gd) as the public key. But we avoid the details of this computation here and assume that all the three traces are available to the signature verifier.

Algorithm 5.20 provides an efficient way of computing exponentiations in G. For DSA-like signature verification (cf. Algorithm 5.44), one computes products of the form ga(gd)b with d unknown. In the XTR world, this amounts to computing the trace Tr(ga(gd)b) from the knowledge of a, b, Tr(g) and Tr(gd) (or Sd(Tr(g))) but without the knowledge of d. The XTR exponentiation algorithm is as such not applicable in such a situation. We should, therefore, prescribe a method to compute traces of products in G. Doing that requires some mathematics that we mention now without proofs. See Lenstra and Verheul [170] for the missing details.

Let e :=ab–1 (mod q). Then, a + bdb(e + d) (mod q), that is, Tr(ga(gd)b) = Tr(gb(e+d)), that is, it is sufficient to compute Tr(ge+d) from the knowledge of e, Tr(g) and Tr(gd). We treat the 3-tuple Sk(Tr(g)) as a row vector (over ). For , let Mc denote the matrix

Equation 5.9


We take c := Tr(g). It can be shown that det , that is, the matrix MTr(g) is invertible, and we have:

Equation 5.10


Here the superscript t denotes the transpose of a matrix. With these observations, one can write the procedure for computing Tr(ga(gd)b) as in Algorithm 5.48.

Algorithm 5.48. XTR multiplication

Input: a, b, Tr(g) and Sd(Tr(g)) for some unknown d.

Output: Tr(ga(gd)b).

Steps:

Compute e := ab–1 (mod q).
Compute Se(Tr(g)) using Algorithm 5.20 with c := Tr(gand n := e.
Use Equation (5.10) to compute Tr(ge+d).
Use Algorithm 5.20 with c := Tr(ge+dand n := b to compute
    .
Return Tr(gb(e+d)).

XTR–DSA signature generation (Algorithm 5.49) is an obvious adaptation of Algorithm 5.43.

Algorithm 5.49. XTR signature generation

Input: A message M to be signed and the signer’s private key d.

Output: The signature (M, s, t) on M with s, .

Steps:

do {
  Generate a random .
  Compute Tr(gd).          /* Use Algorithm 5.20 with c := Tr(gand n := d′ */
  Let Tr(gd) = x1α + x2α2.     /* α is defined in Section 5.2.7 to represent  */
  s := x1 + px2 (mod q).
while (s ≠ 0).
t := d–1(H(M) + ds) (mod q).         /* Here H is the hash function SHA-1 */

The bulk of the time taken by Algorithm 5.43 goes for the computation of Tr(gd). Since the trace representation of XTR makes this exponentiation three times as efficient as the corresponding DSA exponentiation, XTR–DSA signature generation runs nearly three times as fast as DSA signature generation.

XTR–DSA signature verification can be easily translated from Algorithm 5.44 and is shown in Algorithm 5.50. The most costly step in the XTR–DSA verification routine is the computation of Tr(gw1 (gd)w2). One uses Algorithm 5.48 for this purpose. This algorithm, in turn, invokes the exponentiation Algorithm 5.20 twice. For the original DSA signature verification (Algorithm 5.44), the costliest step is the computation of gw1 (gd)w2, which involves two exponentiations and a (cheap) multiplication. A careful analysis shows that XTR–DSA signature verification runs nearly 1.75 times faster than DSA verification.

Algorithm 5.50. XTR signature verification

Input: XTR–DSA signature (M, s, t) on a message M and the signer’s public key (Tr(gd–1), Tr(gd), Tr(gd+1)).

Output: Verification status of the signature.

Steps:

if or { Return “Signature not verified”. }

w := t–1 (mod q).

w1 := H(M)w (mod q).

w2 := sw (mod q).

Compute Tr(gw1 (gd)w2)./* Use Algorithm 5.48 */
Write this trace value as ./* See Section 5.2.7 */

(mod q).

if { Return “Signature verified”. }

else { Return “Signature not verified”. }

*5.4.9. The NTRUSign Algorithm

The NTRU Signature Scheme (NSS) (Hoffstein et al. [131]) is an adaptation of the NTRU encryption algorithm discussed in Section 5.2.8. Cryptanalytic studies (Gentry et al. [110]) show that the NSS has security flaws. A newer version of the NSS, referred to as NTRUSign and resistant to these attacks, has been proposed by Hoffstein et al. [128]. In this section, we provide a brief overview of NTRUSign.

In order to set up the domain parameters for NTRUSign, we start with an and consider the ring . Elements of R are polynomials with integer coefficients and of degrees ≤ n – 1. The multiplication of R is denoted by ⊛, which is essentially the multiplication of two polynomials of followed by setting Xn = 1. We also fix a positive integer β to be used as a modulus for the coefficients of the polynomials in R. The subsets and of R are of importance for the NTRUSign algorithm, where for one defines , and where νf and νg are suitably chosen parameters. The message space is assumed to consist of pairs of polynomials of R with coefficients reduced modulo β. We further assume that we have at our disposal a hash function H that maps messages (that is, binary strings) to elements of .

Let . The average of the coefficients of a is denoted by . The centred norma‖ of a is defined by

For two polynomials a, , one also defines

‖(a, b)‖2 := ‖a2 + ‖b2.

The parameters νf and νg should be so chosen that any polynomial and any polynomial have (centred) norms on the order O(n). An upper bound B on the norms (of pairs of polynomials) should also be predetermined.

Typical values for NTRUSign parameters are

(n, β, νf, νg, B) = (251, 128, 73, 71, 300).

It is estimated that these choices lead to a security level at least as high as in an RSA scheme with a 1024-bit modulus. For very long-term security, one may go for (n, β) = (503, 256).

In order to set up a key pair, the signer first chooses two random polynomials and . The polynomial f should be invertible modulo β and the signer computes with the property that fβf ≡ 1 (mod β). The public key of the signer is the polynomial hfβg (mod β), whereas the private key is the tuple (f, g, F, G), where F and G are two polynomials in R satisfying

fGgF = qandF‖, ‖G‖ = O(n).

Hoffstein et al. [128] present an algorithm to compute F and G with ‖F‖, from polynomials f and g with ‖f‖, , where c is a given constant.

Algorithm 5.51. NTRU signature generation

Input: A message M to be signed and the signer’s private key (f, g, F, G).

Output: The signature (M, s) on M.

Steps:

Compute .

Compute polynomials A, B, a, satisfying

Gm1Fm2=A + βB,
gm1 + fm2=a + βb,

where a and A have coefficients in the range between –β/2 and +β/2.

Compute sfB + Fb (mod β).

NTRUSign signature generation is described in Algorithm 5.51. It is apparent that the NTRUSign algorithm derives its security from the difficulty in computing a vector v in a certain lattice, close to the vector defined by the hashed message (m1, m2). For defining the lattice, we first note that a polynomial can be identified as a vector (u0, u1, . . . , un–1) of dimension n defined by its coefficients. Similarly, two polynomials u, define a vector, denoted by (u, v), of dimension 2n. To the public key h we associate the 2n-dimensional lattice

It is clear from the definitions that both (f, g) and (F, G) are in Lh.

If h = (h0, h1, . . . , hn–1), then for each i = 0, 1, . . . , n – 1 we have

Xih(X)(hni, . . . , hn–1, h0, . . . , hni–1) (mod β) and
0 ⊛ h(X)βXi (mod β).

It follows immediately that Lh is generated by the rows of the matrix

Now, consider the signature generation routine (Algorithm 5.51). The hash function H generates from the message M a random 2n-dimensional vector m := (m1, m2) not necessarily on Lh. We then look at the vector v := (s, t) defined as:

sfB + Fb (mod β), and
tgB + Gb (mod β).

The lattice Lh has the rotational invariance property, namely, if , then (Xiu, Xiv) is also in Lh for all i = 0, 1, . . . , n – 1. More generally, if , then for any polynomial . In particular, since v = (s, t) = B ⊛ (f, g) + b ⊛ (F, G) (mod β) and since (f, g), , it follows that . Of these two polynomials only s is needed for the generation of NTRUSign signatures. The other is needed during signature verification and can be computed easily from s using the formula ths (mod β), the validity of which is established from the definition of the lattice Lh.

The vector is close to the message vector m in the sense that

for the constant c chosen earlier (see Hoffstein et al. [128] for a proof of this relation). The verification routine can, therefore, be designed as in Algorithm 5.52.

Algorithm 5.52. NTRU signature verification

Input: A signature (M, s) and the signer’s public key h.

Output: Verification status of the signature.

Steps:

Compute .

Compute ths (mod β).

if (‖(m1s, m2t)‖ ≤ B) { Return “Signature verified”. }

else { Return “Signature not verified”. }

For the choice (n, β, c) = (251, 128, 0.45), we have ‖(m1s, m2t)‖ ≈ 216. Therefore, choosing the norm bound B slightly larger than this value (say, B = 300) allows the verification scheme to work correctly most of the time. The knowledge of the private key (f, g, F, G) allows the legitimate signer to compute the close vector (s, t) easily. On the other hand, for a forger (who is lacking the private information) fast computation of a vector v′ = (s′, t′) with small norm ‖(m1s′, m2t′)‖ (say ≤ 400 for the above parameter values) seems to be an intractable task. This is precisely why forging an NTRUSign signature is considered infeasible.

An exhaustive search can be mounted for generating a valid signature (s′, t′) on a message M with H(M) = (m1, m2). More precisely, a forger fixes half of the 2n coefficients of the polynomials s′ and t′ and then tries to solve t′ ≡ hs′ (mod β) for the remaining half such that the norm ‖(m1s′, m2t′)‖ is small. It is estimated (see Hoffstein et al. [128] for the details) that the probability that a random guess for the unknown half succeeds is very low (≤ 2–178.44 for the given parameter values).

Another attack on the NTRUSign scheme is to determine the polynomials f, g from a knowledge of h. Since (f, g) is a short non-zero vector in the lattice Lh, an algorithm that can find such vectors can determine (f, g) (or a rotated version of it). However, for a proper choice of the parameters such an algorithm is deemed infeasible. (Also see the NTRU encryption scheme in Section 5.2.8.)

Similar to the NTRU encryption scheme, the NTRUSign scheme is fast, namely, both signature generation and verification can be carried out in time O(n2). This is one of the main reasons why the NTRUSign scheme deserves popularity. Indeed, it may be adopted as an IEEE standard. Unfortunately, however, several attacks on NTRUSign are known. Gentry and Szydlo [111] indicate the possibility of extending the attacks of Gentry et al. [110]. Nguyen [217] proposes a more concrete attack on NTRUSign, that is capable of recovering the private key from only 400 signatures. The future of NTRUSign and its modifications remains uncertain.

5.4.10. Blind Signature Schemes

Suppose that an entity (Alice) referred to as the sender or the user, wants to get a message M signed by a second entity (Bob) called the signer, without revealing M to Bob. This can be achieved as follows. First Alice transforms the message M to and sends to Bob. Bob generates the signature (, σ) on and sends this pair back to Alice. Finally, Alice applies a second transform g to generate the signature of Bob on M. The transform f hides the actual message M from Bob and, thereby, disallows Bob from associating Alice with the signed message (M, s). Such a signature scheme is called a blind signature scheme.

Blind signatures are widely used in electronic payment systems in which Alice (a customer) wants the signature of Bob (the bank) on an electronic coin, but does not want the bank to be capable of associating Alice with the coin. In this way, Alice achieves anonymity while spending an electronic coin.

In a blind signature scheme, Bob does not know M, but his signature on is essential for Alice to reconstruct the signature on M. Furthermore, the blind signature on M should not allow Alice to compute the blind signature on another message M′. More generally, Alice should not be able to generate l + 1 (or more) blind signatures with only l (or fewer) interactions with Bob. A forgery of this kind is often called an (l, l + 1) forgery or a one-more forgery (in case l is bounded above by a polynomial in the security parameter) or a strong one-more forgery (in case l is bounded above poly-logarithmically in the security parameter). An (l, l + 1) forgery is mountable on a scheme which is not existentially unforgeable (Exercises 5.15 and 5.19). Usually, existential forgery gives forged signatures on messages over which the forger has no (or little) control (that is, on messages which are likely to be meaningless).

Now, we describe some common blind signature schemes. We provide a brief overview of the algorithms. Detailed analysis of the security of these schemes can be found in the references cited at the end of this chapter.

Chaum’s RSA blind signature protocol

Chaum’s blind signature protocol is based on the intractability of the RSAP (or the IFP). The signer generates two (distinct) large random primes p and q and computes n := pq. He then chooses a random integer e with gcd(e, φ(n)) = 1 and computes an integer d such that ed ≡ 1 (mod φ(n)). The public key (of the signer) is the pair (n, e), whereas the private key is d. Chaum’s protocol works as in Algorithm 5.53.

Algorithm 5.53. Chaum’s RSA blind signature

Input: A message M generated by Alice.

Output: Bob’s blind RSA signature (M, s) on M.

Steps:

Alice hashes the message M to .

Alice chooses a random and computes .

Alice sends to Bob.

Bob generates the signature on .

Bob sends σ to Alice.

Alice computes Bob’s (blind) signature s := ρ–1σ (mod n) on M.

Since σ ≡ (ρem)dρmd (mod n), we have sρ–1σmd (mod n), that is, s is indeed the RSA signature of Bob on M. Bob receives and gains no idea about m, since ρ is randomly and secretly chosen by Alice.

The Schnorr blind signature protocol

Let G be a finite multiplicative Abelian group and let be of order r (a large prime). We assume that computing discrete logarithms in G is an infeasible task. The key pair of the signer is denoted by (d, gd), where the integer d, 2 ≤ dr – 1, is the private key and gd the public key. The Schnorr blind signature protocol is described in Algorithm 5.54.

Algorithm 5.54. Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind Schnorr signature (M, s, t) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses a random and computes .

Bob sends to Alice.

Alice selects α, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes .

Bob sends to Alice.

Alice computes .

It is easy to check that the output (M, s, t) of Algorithm 5.54 is a valid Schnorr signature of Bob on the message M. The session key d′ (Algorithm 5.38) for this signature is . Since d and are secret knowledges of Bob, Alice must depend on Bob for the computation of . The message M is never sent to Bob. Also its hash is masked by β. This is how this protocol achieves blindness.

The Okamoto–Schnorr blind signature protocol

Okamoto’s adaptation of the Schnorr scheme is proved to be resistant to an attack by a third entity (Pointcheval and Stern [237]). As in the Schnorr scheme, we fix a (finite multiplicative Abelian) group G (in which it is difficult to compute discrete logarithms). We then choose two elements g1, of (large prime) order r. The private key of the signer now comprises a pair (d1, d2) of integers in {2, . . . , r – 1}, whereas the public key y is the group element . We assume that there is a hash function H whose outputs are in . We identify elements of G as bit strings. The Okamoto–Schnorr blind signature protocol is explained in Algorithm 5.55.

Algorithm 5.55. Okamoto–Schnorr blind signature

Input: A message M generated by Alice.

Output: Bob’s blind signature (M, s1, s2, s3) on M.

Steps:

Alice asks Bob to initiate a communication.

Bob chooses random and computes .

Bob sends to Alice.

Alice selects α, β, randomly.

Alice computes .

Alice computes and .

Alice sends to Bob.

Bob computes and .

Bob sends and to Alice.

Alice computes and .

An Okamoto–Schnorr signature (M, s1, s2, s3) on a message can be verified by checking the equality s1 = H(Mu), where . Each invocation of the protocol uses a session private key . Alice must depend on Bob for generating s2 and s3, because she is unaware of the private values d1, d2, and . Alice, in an attempt to forge Bob’s blind signature, may start with random and of her choice. But she still needs the integers d1 and d2 in order to complete the protocol. The blindness of Algorithm 5.55 stems from the fact that the message M is never sent to Bob and its hash is masked by γ.

5.4.11. Undeniable Signature Schemes

So far we have seen signature schemes for which any entity with a knowledge of the signer’s public key can verify the authenticity of a signature. There are, however, situations where an active participation of the signer is necessary for the verification of a signature. Moreover, during a verification interaction a signer should not be allowed to deny a legitimate signature made by him. A signature meeting these requirements is called an undeniable signature.

Undeniable signatures are typically used for messages that are too confidential or private to be given unlimited verification facility. In case of a dispute, an entity should be capable of proving a forged signature to be so and at the same time must accept the binding to his own valid signatures. So in addition to the signature generation and verification protocols, an undeniable signature scheme comes with a denial or disavowal protocol to guard against a cheating signer that is unwilling to accept his valid signature either by not taking part in the verification interaction or by responding incorrectly or by claiming a valid signature to be forged.

There are applications where undeniable signatures are useful. For example, a software vendor can use undeniable signatures to prove the authenticity of its products only to its (paying) customers (and not to everybody).

Chaum and van Antwerpen gave a first concrete realization of an undeniable signature scheme [52, 51]. It is based on the intractability of computing discrete logs in the group , p a prime. Gennaro et al. [109] later adapted the algorithm to design an RSA-based undeniable signature scheme. We now describe these two schemes. Rigorous studies of these schemes can be found in the original papers. See also [53, 186, 187, 102, 202, 230].

The Chaum–Van Antwerpen undeniable signature scheme

For setting up the domain parameters for Chaum–Van Antwerpen (CvA) signatures, Bob chooses a (large) prime p of the form p = 2r + 1, where r is also a prime. (Such a prime p is called a safe prime (Definition 3.5).) Bob finds a random element of multiplicative order r, selects a random integer and computes y := gd (mod p). Bob publishes (p, g, y) as his public key and keeps the integer d secret as his private key. The value d–1 (mod r) is needed during verification and can be precomputed and stored (secretly) along with d. We assume that we have a hash function H that maps messages (that is, bit strings) to elements of the subgroup of order r in . In order to generate a CvA signature on a message M, Bob carries out the steps given in Algorithm 5.56. Verification of Bob’s CvA signature by Alice involves the interaction given in Algorithm 5.57.

Algorithm 5.56. Chaum–Van Antwerpen undeniable signature generation

Input: The message M to be signed and the signer’s private key (p, d).

Output: The signature (M, s) on M.

Steps:

m := H(M).

s := md (mod p).

If (M, s) is a valid CvA signature, then

v ≡ (siyj)d–1 (mod r) ≡ ((md)i(gd)j)d–1 (mod r)migjv′ (mod p).

On the other hand, if smd (mod p), Bob can guess the element v′ with a probability of only 1/r, even under the assumption that Bob has unbounded computing resources. This means that unless the signature (M, s) is valid, it is extremely unlikely that Bob can make Alice accept the signature.

The denial protocol for the CvA scheme involves an interaction between the prover Bob and the verifier Alice, as given in Algorithm 5.58. In order to see how this denial protocol works, we note that Algorithm 5.58 essentially makes two calls of the verification protocol. First assume that Bob executes the protocol honestly, that is, Bob follows the steps as indicated. If the signature (M, s) is a valid one, the check v1mi1 gj1 (mod p) (as well as the check v2mi2 gj2 (mod p)) should succeed and Alice’s decision to accept the signature as valid is justified. On the other hand, if (M, s) is a forged signature, that is, if smd (mod p), then the probability that each of these checks succeeds is 1/r as discussed before. Thus, it is extremely unlikely that a forged signature is accepted as valid by Alice. So Alice eventually computes both w1 and w2 equal to si1 i2d–1 (mod r) (mod p) and accepts the signature to be forged. Finally, suppose that Bob is intending to deny the (purported) signature (M, s). If Bob does not fully take part in the interaction, then his intention becomes clear. Otherwise, he sends v1 and/or v2 not computed according to the formulas specified. In that case, Bob succeeds in making Alice compute w1 = w2 with a probability of only 1/r. Thus, it is extremely unlikely that Bob executing this protocol dishonestly can successfully disavow a valid signature.

Algorithm 5.57. Chaum–Van Antwerpen undeniable signature verification

Input: A CvA signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i, .

Alice computes u := siyj (mod p).

Alice sends u to Bob.

Bob computes v := ud–1 (mod r) (mod p).

Bob sends v to Alice.

Alice computes v′ := migj (mod p).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.58. Chaum–Van Antwerpen undeniable signature: denial protocol

Input: A (purported) CvA signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

  1. The signature is valid.

  2. The signature is forged.

  3. Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses two secret random integers i1, .

Alice computes u1 := si1 yj1 (mod p) and sends u1 to Bob.

Bob computes (mod p) and sends v1 to Alice.

if (v1 ≡ mi1 gj1 (mod p)) {
   Alice accepts the signature (Msto be valid and quits the protocol.
}

Alice chooses two other secret random integers i2, .

Alice computes u2 := si2 yj2 (mod p) and sends u2 to Bob.

Bob computes and sends v2 to Alice.

if (v2 ≡ mi2 gj2 (mod p)) {
   Alice concludes the signature (Msto be valid and quits the protocol.
}

Alice computes w1 := (v1gj1)i2 (mod p) and w2 := (v2gj2)i1 (mod p).

if (w1 = w2) {
   Alice concludes that the signature is forged.
else {
   Alice concludes that Bob is trying to deny the signature.
}

RSA-based undeniable signature scheme

Gennaro, Krawczyk and Rabin’s undeniable signature scheme (the GKR scheme) is based on the (intractability of the) RSA problem.

A GKR key pair differs from a usual RSA key pair. The signer chooses two (large) random primes p and q such that both p′ := (p – 1)/2 and q′ := (q – 1)/2 are also prime, and sets n := pq. Two integers e and d satisfying ed ≡ 1 (mod φ(n)) are then selected. Finally, one requires a , g ≠ 1, and ygd (mod n). The public key of the signer is the tuple (n, g, y), whereas the private key is the pair (e, d). It can be shown that g need not be a random element of . Choosing a (fixed) small value of g (for example, g = 2) does not affect the security of the GKR protocol, but makes certain operations (computing powers of g) efficient.

Algorithm 5.59. GKR RSA undeniable signature generation

Input: The message M to be signed and the signer’s private key (e, d).

Output: The signature (M, s) on M.

Steps:

m := H(M)./* Hash the message M to an element m of */
s := md (mod n). 

GKR signature generation (Algorithm 5.59) is the same as in RSA. The verification protocol described in Algorithm 5.60 accepts, in addition to a valid GKR signature (M, s), the signatures (M, αs), where has multiplicative order 1 or 2 (there are four such values of α). In view of this, we define the subset

of . Any element is considered to be a valid signature on M. Since Bob knows p and q, he can easily find out all the elements α of of order ≤ 2 and can choose to output (M, αH(M)d) as the GKR signature for any such α. Taking α = 1 (as in Algorithm 5.59) is the canonical choice, but during the execution of the denial protocol Bob will not be allowed to disavow other valid choices.

The interaction between the prover Bob and the verifier Alice during GKR signature verification is given in Algorithm 5.60. It is easy to see that if (M, s) is a valid GKR signature, then v = v′. On the other hand, if (M, s) is a forged signature, that is, if s ∉ Sig M, then the equality v = v′ occurs with a probability of , even in the case that the forger has unbounded computational resources.

Algorithm 5.60. GKR RSA undeniable signature verification

Input: A GKR signature (M, s) on a message M.

Output: Verification status of the signature.

Steps:

Alice computes m := H(M).

Alice chooses random i, .

Alice computes u := s2iyj (mod n).

Alice sends u to Bob.

Bob computes v := ue (mod n).

Bob sends v to Alice.

Alice computes v′ := m2igj (mod n).

Alice accepts the signature (M, s) if and only if v = v′.

Algorithm 5.61. GKR RSA undeniable signature: denial protocol

Input: A (purported) GKR signature (M, s) of Bob on a message M.

Output: One of the following decisions by Alice:

  1. The signature is forged.

  2. Bob is trying to deny the signature.

Steps:

Alice computes m := H(M).

Alice chooses random and .

Alice computes w1 := migj (mod n) and w2 := siyj (mod n).

Alice sends (w1, w2) to Bob.

Bob computes m := H(M).

Bob determines such that the following congruence holds:

Equation 5.11


if (no such i′ is found) {    /* This may happen, if Alice has cheated */
   Bob aborts the protocol.
}
Bob sends i′ to Alice.
if (i = i′) {
   Alice concludes that the signature is forged.
else {
   Alice concludes that Bob is trying to deny the signature.
}

The denial protocol for the GKR scheme is described in Algorithm 5.61. This protocol is executed, after verification by Algorithm 5.60 fails. In that case, Alice wants to ascertain whether the signature is actually invalid or Bob has denied his valid signature by incorrectly executing the verification protocol. A small integer k is predetermined for the denial protocol. The prover needs a running time proportional to k, whereas the probability of a successful denial of a valid signature decreases with k. Taking k = O(lg n) gives optimal performance.

In order to see how this protocol prevents Bob from denying a valid signature, first consider the case that (M, s) is a valid GKR signature of Bob. In that case, . On the other hand, se ≡ αemde ≡ αem (mod n). Therefore, for every , one has . Thus, Bob can only guess the secret value of i chosen by Alice and the guess is correct with a probability of 1/k. On the other hand, if (M, s) is a forged signature, Congruence (5.11) holds only for a single i′, that is, for i′ = i (Exercise 5.23). Sending i′ will then convince Alice that the signature is really forged. In both these cases, Congruence (5.11) holds for at least one i′. Failure to detect such an i′ implies that the value(s) of w1 and/or w2 have not been correctly sent by Alice. The protocol should then be aborted.

In order to reduce the probability of successful cheating, it is convenient to repeat the protocol few times instead of increasing k. If k = 1024, Bob can successfully cheat in eight executions of the denial protocol with a probability of only 2–80.

5.4.12. Signcryption

The conventional way to ensure both authentication and confidentiality of a message is to sign the message first and then encrypt the signed message. Now that we have many signature and encryption algorithms in our bag, there is hardly any problem in achieving both the goals simultaneously. Zheng proposes signcryption schemes that combine these two operations together. A signcryption scheme is better than a sign-and-encrypt scheme in two aspects. First, the combined primitive takes less running time than the composite primitive comprising signature generation followed by encryption. Second, a signcrypted message is of smaller size than a signed-and-encrypted message. When communication overheads need to be minimized, signcryption proves to be useful.

Before describing the signcryption primitive, let us first review the composite sign-and-encrypt scheme. Let M be the message to be sent. Alice the sender generates the signature appendix s on M using one of the signature schemes described earlier. This step can be described as s = fs(M, da), where da is the private key of Alice. Next a symmetric key k is generated by Alice. The message M is encrypted by a symmetric cipher (like DES) under the key k, that is, C := E(M, k). The key k is then encrypted using an asymmetric routine under the public-key eb of Bob the recipient, that is, c = fe(k, eb). The triple (C, c, s) is then transmitted to Bob.

Upon reception of (C, c, s) Bob first retrieves k using his private key db, that is, k = fd(c, db). The message M is then recovered by symmetric decryption: M = D(C, k). Finally, the authenticity of M is verified from the signature using the verification operation: fv(M, s, ea), where ea is the public key of Alice. Algorithm 5.62 describes the sign-and-encrypt operation and its inverse.

Algorithm 5.62. Sign-and-encrypt

s := fs(M, da).

Generate a random symmetric key k.

c := fe(k, eb).

C := E(M, k).

Send (C, c, s) to the recipient.

Decrypt-and-verify

k := fd(c, db).

M := D(C, k).

Verify the signature: fv(M, s, ea).

Zheng’s signcryption scheme combines fs and fe to a single operation fse and also fd and fv to another single operation fdv. Each of these combined operations essentially takes the time of a single public- or private-key operation and hence leads to a performance enhancement by a factor of nearly two. Moreover, the encrypted key c need not be sent with the message, that is, C and s are sufficient for both authentication and confidentiality. This reduces communication overhead.

Signcryption is based on shortened digital signature schemes. Table 5.3 describes the shortened version of DSA (Section 5.4.6). We use the notations of Algorithms 5.43 and 5.44. Also ‖ denotes concatenation of strings, and H is a hash function (like SHA-1). The shortened schemes have two advantages over the original DSA. First, a DSA signature is of length 2|r|, whereas an SDSA1 or SDSA2 signature has length |r| + |H(·)|. For the current version of the standard, both r and H(·) are of size 160 bits. However, one may use potentially bigger r and in that case the shortened schemes give smaller signatures with equivalent security. Finally, DSA requires computing a modular inverse during verification, whereas SDSA does not. So verification is more efficient in the shortened schemes.

Table 5.3. Shortened digital signature algorithms
NameSignature generationSignature verification
SDSA1s := H(gd (mod p)‖M). t := d′(s + d)–1 (mod r).w := (eags)t (mod p). Verify if s = H(wM).
SDSA2s := H(gd (mod p)‖M). t := d′(1 + ds)–1 (mod r).. Verify if s = H(wM).

Algorithms 5.63 and 5.64 provide the details of the signcryption algorithm and its inverse called unsigncryption. The algorithms use a keyed hash function KH. One may implement KH(x, ) as using an unkeyed hash function H.

Signcryption differs from the shortened scheme in that is used instead of gd for the computation of s. The running time of the signcryption algorithm is dominated by this modular exponentiation. When signature and encryption are used separately, the encryption operation uses one (or more) exponentiations. So signcryption significantly improves upon the sign-and-encrypt scheme of Algorithm 5.62.

Algorithm 5.63. Signcryption

Input: Plaintext message M, the sender’s private key da, the recipient’s public key

eb = gdb (mod p).

Output: The signcrypted message (C, s, t).

Steps:

Select a random .
.                /* Generate keys for both signing and encrypting. */
Write k := k1 ‖ k2 with |k2equal to the length of a symmetric key.
s := KH(MNk1).
                 /* Here N is the public key or the public key certificate of the sender. */

C := E(Mk2).                                                          /* Symmetric encryption */

Algorithm 5.64. Unsigncryption

Input: The signcrypted message (C, s, t), the sender’s public key ea = gda (mod p) and the recipient’s private key db.

Output: The plaintext message M and the verification status of the signature.

Steps:

Write k := k1k2 with |k2| equal to the length of a symmetric key.

M := D(C, k2)./* Symmetric decryption */

if (KH(MN, k1) = s) { Return “Signature verified”. }

else { Return “Signature not verified”. }

The most time-consuming part of unsigncryption is the computation of two modular exponentiations. DSA verification too has this property. However, an additional decryption in the decrypt-and-verify scheme of Algorithm 5.62 calls for one (or more) exponetiations, making it slower that unsigncryption.

Exercise Set 5.4

5.15
  1. Show how first pre-image resistance of the hash function H plays an important role for RSA signatures (with appendix) described in Section 5.4.1. More precisely, show that if it is easy to find a pre-image of any hash value, it is easy to generate a valid signature (M, s) from two valid signatures (M1, s1) and (M2, s2) with M ∉ {M1, M2}. This is often referred to as existential forgery of a signature. [H]

  2. Describe how existential forgery is possible for the Rabin signature scheme. [H]

  3. Describe how existential forgery is possible for the ElGamal signature scheme. [H]

5.16Assume that Bob uses the same RSA key pair ((n, e), d) for receiving encrypted messages and for signing. Suppose that Carol intercepts the ciphertext cme (mod n) sent by Alice. Also suppose that Bob is willing to sign any random message presented by Carol. Explain how Carol can choose a message to be signed by Bob in order to retrieve the secret m. [H]
5.17Let G be a finite cyclic group of order n, and g a generator of G. Suppose that Alice’s private and public keys are respectively d and gd.
  1. Consider a variant of the ElGamal signature scheme, in which s is computed as in Algorithm 5.36, but the roles of d and d′ are interchanged in the generation of t, that is, the modified signature (s, ) on M is generated as:

    s:=gd′,
    :=d–1[H(M) – dH(s)] (mod n).

    Write the verification routine for the modified scheme.

  2. Show that forging modified ElGamal signatures is as difficult as computing discrete logarithms in G. You may assume that a forger can arrange d′ of her choice.

  3. Explain why signature generation is (a bit) more efficient in the modified scheme. Suppose that because of this enhanced performance Alice decided to switch to the modified scheme, but for backward compatibility she maintained both the original signature (s, t) and the modified signature (s, ) on a message M. What went wrong?

5.18Show that:
  1. There are two valid ECDSA signatures on each message.

  2. There are three valid XTR–DSA signatures on each message.

(Here we call a signature valid, if it passes the verification routine.)

5.19
  1. Write the versions with message recovery of the RSA, Rabin, Schnorr and Nyberg–Rueppel signature schemes.

  2. Describe the possibilities of existential forgery for these versions. (Since hash functions cannot be inverted, they are not used for signature schemes with message recovery, and so the problem of existential forgery is more acute in this case. To avoid such forgeries the signer should add some redundancy to each message block before signing the same. An existentially forged signature is likely to correspond to a message not containing the redundancy.)

5.20Design the XTR version of the Nyberg–Rueppel signature scheme with appendix (Section 5.4.5). What are the speed-ups achieved by the signature generation and verification routines of the XTR version over the original NR routines?
5.21Repeat Exercise 5.20 with the Schnorr digital signature scheme (Section 5.4.4).
5.22
  1. Deduce that the determinant of the matrix Mc of Equation (5.9) is

  2. Demonstrate that

5.23Let p, q, p′, q′ be distinct odd primes with p = 2p′ + 1 and q = 2q′ + 1, and let n := pq (as in the RSA-based undeniable signature scheme).
  1. Let . Show that . [H]

  2. Argue that there are exactly four elements in of order ≤ 2.

  3. Let α ≢ ±1 (mod n) and ordn α < pq′. Show that gcd(α – 1, n) or gcd(α + 1, n) is a non-trivial divisor of n. How many such elements α does contain?

  4. Let have order pq′ or 2pq′. Show that for every .

  5. Look at the denial protocol for the GKR RSA signature scheme (Algorithm 5.61) and assume that p′ < q′. Suppose that (M, s) is a forged signature (that is, s ∉ Sig M) on some message M with . Show that s ≡ αmd (mod n) for some with ordn α ≥ p′. Deduce that ordn(mse) ≥ p′. Conclude that if 4k < p′, then there exists a unique (namely, i′ = i) for which Congruence (5.11) holds.

5.24
  1. Write the shortened versions of ECDSA signature generation and verification.

  2. Write the signcryption and unsigncryption algorithms based on shortened ECDSA.

5.5. Entity Authentication

Entity authentication (also called identification) is a process by means of which an entity Alice, called the claimant, proves her identity to another entity Bob, called the prover or the verifier. Alice is assumed to possess some secret piece(s) of information that no intruder is expected to know. During the execution of the identification protocol, an interaction takes place between Alice and Bob. If the interaction allows Bob to conclude (deterministically or with high probability) that the claimer possesses the secret knowledge, he accepts the claimer as Alice. An intruder Carol lacking the secret information is expected (with high probability) to fail to convince Bob of her identity as Alice. This is how entity authentication schemes tend to prevent impersonation attacks by intruders. Typically, identification schemes are used to protect access to some sensitive piece(s) of data, like a user’s (or a group’s) private files in a computer or an account in a bank. Both secret-key and public-key techniques are used for the realization of entity authentication protocols.

5.5.1. Passwords

A password is a small string to be remembered by an entity and produced verbatim to the verifier at the time of identification. The most common example is a computer password used to protect access to a user’s private working area in a file system. In this case, an alphanumeric string (or a string that can be input using a computer keyboard) of length between 4 and 20 characters is normally used as the secret information associated with an entity. Passwords are also used to prevent misuse of certain physical objects (like an ATM card for withdrawing cash from one’s bank account, a prepaid telephone card) by anybody other than the legitimate owners of the objects. In this case, a password usually consists of a sequence of four to ten digits and is also called a personal identification number or a PIN.

In order that Bob can recognize an entity from her password, a possibility for Bob is to store the (entity, password) pairs corresponding to all the entities that are expected to participate in identification interactions with Bob. When Alice enters her password, Bob checks if Alice’s input is the same as what he stores in the pair for Alice. The file(s) storing these private records should be preserved with high secrecy, and neither read nor write access should be granted to any user. But a privileged user (the superuser) is usually given the capability to inspect any file (even read-protected ones) and, therefore, can make misuse of the passwords.

This problem can be avoided by storing, instead of the passwords themselves, a one-way transform of the passwords.[3] When Alice enters a password P, Bob computes the transform f(P) and compares f(P) with the record stored for Alice. The identity of Alice is accepted if and only if a match occurs. The password file now need not be read-protected, since any intruder (even the superuser) knowing the value f(P) cannot easily compute P.

[3] Informally speaking, a one-way function is one which is computationally infeasible to invert.

Passwords should be chosen from a space large enough to preclude exhaustive search by an intruder in feasible time. Unfortunately, however, it is a common tendency for human users to choose passwords from limited subsets of the allowed space. For example, use of lower case characters, dictionary words, popular names, birth dates and so on in passwords makes attacks on passwords much easier. A strategy to foil such dictionary-based attacks is to use a pseudorandom bit sequence S known as the salt and apply the one-way function f to a combination of the password P and the salt S. That is, a function f(P, S) is now stored against an entity Alice having a password P. The combination (P, S) is often referred to as a key for the password scheme. Since a password now corresponds to many possible keys, the search space for an intruder increases dramatically. For instance, if S is a pseudorandomly chosen bit string of length 64, the intruder has to compute f(P, S) for a total of 264 times in order to guess the correct candidates for S for each P under trial. It is also necessary that the same key is not chosen for two different entities. If the salt S is a 64-bit string, then by the birthday paradox a collision between two keys is expected to occur only after (at least) 232 keys are generated.

A second strategy to strengthen the protection of passwords is to increase the so-called iteration count n, that is, instead of storing f(P, S) for each password P, Bob now stores fn(P, S). An n-fold application of the function f increases by a factor of n both the time for password verification and for exhaustive search by an intruder. For a legitimate user, this is not really a nuisance, since computation of fn(P, S) only once during identification is tolerable (and may even be unnoticeable), whereas to an intruder breaking a password simply becomes n times as difficult. In typical applications, values of n ≥ 1000 are recommended.

In some situations, it is advisable to lock access to a password-protected area after a predetermined number of (say, three) wrong passwords have been input in succession. This is typically the case with PINs for which the search space is rather small. For unlocking the access (to the legitimate user Alice), a second longer key (again known only to Alice) is used or human intervention is called for.

As a case study, let us briefly describe the password scheme used by the UNIX operating system. During the creation of a password a user supplies a string P of eight 7-bit ASCII characters as the password. (Longer strings are truncated to first 8 characters.) A 56 bit DES[4] key K is constructed from P. A 12-bit random salt S is obtained from the system clock at the time of the creation of the password. The zero message (that is, a block of 64 zero bits) is then iteratively encrypted n = 25 times using K as the key. The encryption algorithm is a variant of the DES, that depends on the salt S. The output ciphertext and the salt (which account for a total of 64 + 12 = 76 bits) are then packed into eleven 7-bit ASCII characters and stored in the password file (usually /etc/passwd). When UNIX was designed (in 1970), this algorithm, often referred to as the UNIX crypt password algorithm, was considered to be reasonably safe under the assumption of the difficulty of finding a DES key from a plaintext–ciphertext pair. With today’s hardware and software speed, a motivated attacker can break UNIX passwords in very little time.

[4] The data encryption standard (DES) is a well-known symmetric-key cipher (Section A.2.1).

Password-based authentication schemes suffer from the disadvantage that the user has to disclose her secret P to the verifier. The verifier may misuse the knowledge of P by storing it secretly and deploying it afterwards. During the process of computation of fn(P, S) the string P resides in the machine’s memory. An eavesdropper capable of monitoring the temporary storage holding the string P easily gets its value. In view of these shortcomings, password schemes are referred to as weak authentication schemes.

5.5.2. Challenge–Response Algorithms

In a strong authentication scheme, the claimant proves the possession of a secret knowledge to a verifier without disclosing the secret to the verifier. One of the communicating entities generates a random bit string c known as the challenge and sends c (or a function of c) to the other. The latter then reacts to the challenge appropriately, for example, by sending a response string r to the former. Strong authentication schemes are, therefore, also called challenge–response authentication schemes. The communication between the entities depends both on the random challenge and on the secret knowledge of the claimant. An intruder lacking the secret knowledge of a valid claimant cannot take part properly in the interaction. Furthermore, since a random challenge is used during each invocation of the identification protocol, an eavesdropper cannot use the intercepted transcripts of a particular session for a future invocation of the protocol.

Public-key protocols can be used to realize challenge–response schemes. We assume that Alice is the claimant and Bob is the verifier. Without committing to specific algorithms, we denote the public and private keys of Alice by e and d, and the encryption and decryption transforms by fe and fd respectively. Alice proves her identity by demonstrating her knowledge of d (but without revealing d) to Bob. Bob uses the transform fe and Alice the transform fd under the respective keys e and d. If a key d′ other than d is used by Carol in conjunction with e, some step of the interaction detects this and the protocol rejects Carol’s claim to be Alice. We describe two challenge–response schemes that differ in the sequence of applying the transforms fe and fd.

A challenge–response scheme based on encryption–decryption

In this scheme, Bob (the verifier) first generates a random string r, encrypts the same by the public key of Alice (the claimant) and sends the ciphertext c (the challenge) to Alice. Alice uses her private key to decrypt c to the message r′ and sends r′ (the response) back to Bob. Identification of Alice succeeds if and only if r = r′. Algorithm 5.65 illustrates the details of this scheme. It employs a one-way function H (like a hash function) for a reason explained later. This scheme checks whether the claimant can recover the random string r correctly. A knowledge of the decryption key d is needed for that.

Algorithm 5.65. Challenge–response authentication based on encryption

Bob generates a random bit string r and computes w := H(r).

Bob reads Alice’s (authentic) public key e and computes c := fe(r, e).

Bob sends (w, c) to Alice.

Alice computes r′ := fd(c, d).

if (H(r′) ≠ w) { Alice quits the protocol. }

Alice sends rto Bob.

Bob identifies Alice if and only if r′ = r.

The string H(r) = w is called the witness. By sending w to Alice, Bob convinces her of his knowledge about the secret r without disclosing r itself. If Bob (or a third party pretending to be Bob) tries to cheat, Alice has the option to abort the protocol prematurely. In other words, Alice does not have to decrypt an arbitrary ciphertext presented by Bob without confirming that Bob knows the corresponding plaintext.

A challenge–response scheme based on digital signatures

In the scheme explained in Algorithm 5.66, Alice (the claimant) first does the private key operation, that is, Alice sends her digital signature on a message to Bob (the prover). Bob then verifies the signature of Alice by employing the encryption transform with Alice’s public key.

Algorithm 5.66. Challenge–response authentication based on signature

Bob selects a random string rB.

Bob sends rB to Alice.

Alice selects a random string rA.

Alice generates the signature s := fd(rArB, d).

Alice sends (rA, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

This authentication scheme is based on the assumption that only a person knowing Alice’s private key d can generate a signature s that leads to the equalities and . Using only rA and the signature s = fd(rA, d) would demonstrate to Bob that Alice possesses the requisite knowledge of d. The random string rB is used to prevent the so-called replay attack. If rB were not used, an eavesdropper Carol intercepting the transcripts of a session can later claim her identity as Alice by simply supplying rA and Alice’s signature on rA to Bob. Using a new rB in every session (and incorporating it in the signature) guarantees that the signature varies in different sessions, even when rA remains the same.

There is an alternative strategy by which the use of the random string rB can be avoided. All we have to ensure is that a value of rA used once cannot be reused in a subsequent session. This can be achieved by using a timestamp, which is a string reflecting the time when a certain event occurs (in our case, when Alice generates the signature). Thus, if Alice gets the local time tA, computes the signature s := fd(tA, d) and sends (tA, s) to Bob, it is sufficient for Bob to check that the timestamp tA is valid. A possible criterion for the validity of Alice’s timestamp tA is that the difference between tA and the time when Bob is verifying the signature is within an allowed bound (predetermined, based on the approximate time for the communication). But it may be possible for an adversary to provide to Bob the timestamp tA and Alice’s signature on tA, before tA expires. Therefore, Bob should additionally ensure that timestamps from Alice come in a strictly ascending order. Maintaining the timestamp for the last interaction with Alice takes care of this requirement. Algorithm 5.67 describes the modified version of Algorithm 5.66, based on timestamps. A problem with timestamps is that (local) clocks across a network have to be properly synchronized.

Algorithm 5.67. Using timestamp in challenge–response authentication

Alice reads the local time tA.

Alice generates the signature s := fd(tA, d).

Alice sends (tA, s) to Bob.

Bob reads Alice’s (authentic) public key e.

Bob retrieves the time-stamp .

Bob identifies Alice if and only if and this timestamp is valid.

Mutual authentication

So far, we have described identification schemes that are unidirectional or unilateral in the sense that only Alice tries to prove her identity to Bob. For mutual authentication between Alice and Bob, the above schemes can be used a second time by reversing the roles of Alice and Bob. Algorithm 5.68 describes an alternative strategy that achieves mutual authentication with reduced communication overhead (compared to two invocations of the unidirectional scheme). Now, the key pairs (eA, dA) and (eB, dB) and the transforms fe, A, fd, A and fe, B, fd, B of both Alice and Bob should be used.

5.5.3. Zero-Knowledge Protocols

The challenge–response schemes described above ensure that the claimant’s secret is not made available to the verifier (or a listener to the communication between the verifier and the claimant). But the claimant uses her private key for generating the response and, therefore, it continues to remain possible that a verifier extracts some partial information on the secret by choosing challenges strategically.

Algorithm 5.68. Mutual authentication

Bob selects a random string rB.

Bob sends rB to Alice.

Alice selects a random string rA.

Alice generates the signature sA := fd, A(rArB, dA).

Alice sends (rA, sA) to Bob.

Bob reads Alice’s (authentic) public key eA.

Bob retrieves the strings and satisfying .

Bob identifies Alice if and only if and .

Bob generates the signature sB := fd, B(rBrA, dB).

Bob sends sB to Alice.

Alice reads Bob’s (authentic) public key eB.

Alice retrieves the strings and satisfying .

Alice identifies Bob if and only if and .

Using a zero-knowledge (ZK) protocol overcomes this difficulty in the sense that (absolutely) no information on the claimant’s secret is leaked out during the conversation between the claimant and the verifier. The verifier (or a listener) continues to remain as much ignorant of the secret as he was before the invocation of the protocol. In other words, the verifier (or a listener) does not learn anything form the conversation, that he could not learn by himself in absence of the claimant. The only thing the verifier gains is the confidence whether the claimant actually knows the secret or not. This is intuitively the defining feature of a ZK protocol.

Similar to other public-key techniques, the security of the ZK protocols is based on the intractability of some difficult computational problems. A repeated use of a public-key scheme with a given set of parameters may degrade the security of the scheme under those parameters. For example, each encryption of a message (or each generation of a signature) makes available a plaintext–ciphertext pair which may eventually help a cryptanalyst. A ZK protocol, on the other hand, does not lead to such a degradation of the security of the protocol, irrespective of how many times it is invoked.

We stick to the usual scenario: Alice is the claimant, Bob is the verifier and Carol is an eavesdropper trying to impersonate Alice. In the jargon of ZK protocols, Alice (and not Bob) is called the prover. In order to avoid confusions, we continue to use the terms claimant and verifier. A ZK protocol is usually a three-pass interactive protocol. To start with, Alice chooses a random commitment and sends a witness of the commitment to Bob. A new commitment should be selected by Alice during each invocation of the protocol in order to guard against an adversarial verifier. Upon receiving the witness, Bob chooses and sends a random challenge to Alice. Finally, Alice replies by sending a response to the challenge. If Alice knows the secret (and performs the protocol steps correctly), her response can be easily proved by Bob to be valid. Carol, in an attempt to impersonate Alice without knowing the secret, can produce the valid response with a probability P bounded away from 1. If P happens not to be negligibly small, then the protocol can be repeated a sufficient number of times, so that Carol’s probability of giving the correct response on all occasions becomes extremely low.

The parameters and the secrets for a ZK protocol can be set privately by each claimant. Another alternative is that a trusted third party (TTP) generates a set of parameters and makes these parameters available for use by every claimant over a network. A second duty of the TTP is to register a secret against each entity. The secret may be generated either by the TTP or by the respective entity. The knowledge of this (registered) secret by an entity is equivalent to her identity in the network. Finally, the authenticity of the public key of an entity is ensured by the digital signature of the TTP on the public key. For simplicity, however, we will not bother about the existence of the TTP and the way in which the secret (the possession of which by Alice is to be proved) has been created and/or handed over to Alice. We will also assume that each entity’s public key is authentic.

The Feige–Fiat–Shamir (FFS) protocol

The FFS protocol (Algorithm 5.69) is based on the intractability of computing square roots modulo a composite integer n. We take n = pq with two distinct primes p and q each congruent to 3 modulo 4.

Algorithm 5.69. Feige–Fiat–Shamir zero-knowledge protocol

Selection of domain parameters:

Select two large distinct primes p and q each congruent to 3 modulo 4.

n := pq.

Select a small integer t./* The probability of a successful cheat is 2t */

Selection of Alice’s secret:

Alice selects t random integers .

Alice selects t random bits .

Alice computes for i = 1, . . . , t.

Alice makes (y1, . . . , yt) public and keeps (x1, . . . , xt) secret.

The protocol:

Alice randomly chooses and ./* Commitment */
Alice computes and sends to Bob w := (–1)γc2 (mod n)./* Witness */
Bob randomly chooses and sends to Alice ./* Challenge */
Alice computes and sends to Bob ./* Response */

Bob computes (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ ≡ ±w (mod n).

It is clear from Algorithm 5.69 that knowing the secret (x1, . . . , xt) allows Alice to let Bob accept her identity (as Alice). The check w′ ≠ 0 in the last line is necessary to preclude the commitment c = 0, that makes any claimant succeed irrespective of the availability of the knowledge of the secret.

Now, let us see how an opponent (Carol), without knowing the secret, can succeed in impersonating Alice by taking part in this protocol. To start with, we consider the simple case t = 1 (which corresponds to Fiat and Shamir’s original scheme). Carol can start the process by generating a random c and γ and computing w = (–1)γc2. Now, Carol should send the response c or cx1 depending on whether Bob sends ∊1 = 0 or 1. Her capability of sending both correctly is equivalent to her knowledge of x1. If Bob sends ∊1 = 0, then she can provide the correct response c. Otherwise, Carol can at best select a random response from , and the probability that this is correct is overwhelmingly low. On the other hand, let Carol choose a random c and and send the (improper) witness . In that case, Carol can answer the valid response r = c, if Bob’s challenge is ∊1 = 1. Sending the correct response to the challenge ∊1 = 0 now requires knowledge of x1. Therefore, if ∊1 is randomly chosen by Bob (without the prior knowledge of Carol), Carol can successfully respond with probability (very close to) 1/2. For t ≥ 1, this probability of a cheat by Carol can be easily shown to be (very close to) 1/2t which is negligibly small for t ≥ 80.

In practice, however, t is chosen to be O(ln ln n). It is, therefore, necessary to repeat the protocol t′ times, so that the probability of a successful cheat becomes (nearly) 1/2tt. Taking t′ = Θ(ln n) is recommended. It can be shown that these choices for t and t′ offer the FFS protocol the desired ZK property. Without going into a proof of this assertion, let us informally explain the ZK property of the FFS protocol. Neither Bob nor a listener to the conversation between Alice and Bob can get any idea of the secret (x1, . . . , xt). Bob gets as a response the product of c and those xi’s for which ∊i = 1. Since c is randomly chosen by Alice and is not available to Bob, there is no way to choose a strategic challenge. However, if the square root of w (or –w) can be computed by Bob, then the interaction may give away partial information on the secret. For example, if Bob chooses the challenge (∊1, ∊2, . . . , ∊t) = (1, 0, . . . , 0), then Alice’s response would be cx1 from which x1 can be computed by Bob, if he knows c. Thus, the security and the ZK property of the FFS protocol are based on the assumption that computing square roots modulo n is an infeasible computational problem.

The Guillou–Quisquater (GQ) protocol

The GQ identification protocol is based on the intractability of the RSA problem. The correctness of Algorithm 5.70 (for a legitimate claimant) is easy to establish. The check w′ ≠ 0 is necessary to avoid the commitment c = 0, which makes a claimant succeed always.

A TTP typically selects the domain parameters p, q, n, e and d. It also selects m and gives s to Alice without revealing d. The execution of the protocol does not require the use of the decryption exponent d. In fact, d is a global secret, whereas s is Alice’s personal secret. Alice tries to prove the knowledge of s (and not of d).

In the GQ algorithm, the power s is blinded by multiplying it with the random commitment c. As a witness for c, Alice presents its encrypted version w. With the assumption that RSA decryption without the knowledge of the decryption exponent d is infeasible, Bob (or an eavesdropper) cannot compute c and hence cannot separate out the value of s. Thus, no partial information on s is provided. Furthermore, each invocation requires a random ∊. In order to compute a strategic witness, Carol can at best have a guess of ∊. The guess is correct with a probability of 1/e. If e is reasonably large, the probability of a successful cheat is low. However, larger values of e lead to more expensive generation of the witness from the commitment (and also of the response). So small values of e (say, 216 + 1 = 65,537) are usually recommended. In that case, repeating the protocol a suitable number of times makes Carol’s chance of cheating as small as one desires. Taking te (where t′ is the number of iterations of the protocol) of the order of (log n)α for some constant α gives the GQ protocol the desired zero-knowledge property.

Algorithm 5.70. Guillou–Quisquater zero-knowledge protocol

Selection of domain parameters:

Select two distinct large primes p and q and set the modulus n := pq.

Select an exponent and compute d := e–1 (mod φ(n)).

The pair (n, e) is made public and d is kept secret.

Selection of Alice’s secret:

Alice selects a random and computes s := md (mod n).

Alice makes m public and keeps s secret.

The protocol:

Alice selects a random ./* Commitment */
Alice computes and sends to Bob w := ce (mod n)./* Witness */
Bob selects and sends to Alice a random ./* Challenge */
Alice computes and sends to Bob r := cs (mod n)./* Response */

Bob computes w′ := mre (mod n).

Bob accepts Alice’s identity if and only if w′ ≠ 0 and w′ = w.

The Schnorr protocol

The Schnorr protocol is based on the intractability of computing discrete logarithms in a large prime field . We assume that a suitably large prime divisor q of p – 1 and an element of multiplicative order q are known. The algorithm works in the subgroup of , generated by g. In order to make the known algorithms for solving the DLP infeasible for the field , one should have q > 2160.

Algorithm 5.71. Schnorr zero-knowledge protocol

Selection of domain parameters:

Select a large prime p such that p – 1 has a large prime divisor q.

Select an element having multiplicative order q modulo p.

Publish (p, q, g).


Select a small integer t < lg q.           /* The probability of a successful cheat is 2t */

Selection of Alice’s secret:

Alice chooses a random secret integer .

Alice computes and makes public the integer y := gd (mod p).

The protocol:

Alice chooses a random ./* Commitment */
Alice computes and sends to Bob w := gc (mod p)./* Witness */
Bob selects and sends to Alice a random ./* Challenge */
Alice computes and sends to Bob r := d∊ + c (mod q)./* Response */

Bob computes w′ := gry (mod p).

Bob accepts Alice’s identity if and only if w′ = w.

We leave the analysis of correctness and security of this protocol to the reader. The secret s is masked from Bob and other eavesdroppers by introducing the random additive bias c modulo q. The probability of a successful cheat by an adversary is 2t, since ∊ is chosen randomly from a set of cardinality 2t. Usually the Schnorr protocol is not used iteratively. Therefore, t ≥ 40 is recommended for making the probability of cheating negligible. On the other hand, if t is too large, then the protocol can be shown to lose the ZK property. For the generation of the witness from the commitment, Alice computes a modular exponentiation to an exponent which is O(q). Generating the response, on the other hand, involves a single multiplication (and a single addition) modulo q and hence is very fast.

Exercise Set 5.5

5.25
  1. Describe how a zero-knowledge witness–challenge–response identification scheme can be converted to a signature scheme. [H]

  2. Write the Feige–Fiat–Shamir, Guillou–Quisquater and Schnorr signature schemes based on the corresponding identification schemes.

5.26Let n := pq with distinct primes p and q each congruent to 3 modulo 4.
  1. Show that –1 is a quadratic non-residue modulo p and modulo q.

  2. If is a quadratic residue modulo n, prove that a has exactly four square roots modulo n, of which exactly one is a quadratic residue modulo n.

  3. Consider the following identification protocol in which Alice wants to prove to Bob her knowledge of the factorization of n = pq. Assume that p and q are sufficiently large so that computing square roots modulo n is infeasible without the knowledge of the factorization of n. Argue that Alice can prove her identity to Bob if and only if she knows the factorization of n.

    A bad zero-knowledge protocol

    Bob chooses a random and computes a := x4 (mod n).

    Bob sends a to Alice.

    Alice computes four square roots of a modulo n and picks up the unique
           square root b which is a quadratic residue modulo n.

    Alice sends b to Bob.

    Bob accepts Alice’s claim if and only if bx2 (mod n).

  4. Conclude that this is not a good zero-knowledge protocol, by demonstrating that Bob can maliciously send a bad a to Alice so that during the execution of the protocol he gathers enough information to factor n. [H]

Chapter Summary

All the material studied in earlier chapters culminates in this relatively short chapter which describes some popular cryptographic algorithms. We address most of the problems relevant in cryptography, namely, encryption, key agreement, digital signatures and entity authentication. Against each algorithm we mention the (provable or alleged) source of security of the algorithm.

Encryption algorithms are treated first. We start with the seemingly most popular RSA algorithm. This algorithm derives its security from the RSA key inversion problem and the RSA problem. The key inversion problem is probabilistic polynomial-time equivalent to the integer factorization problem. The intractability of the RSA problem is unknown. At present no algorithm other than factoring the RSA modulus is known for solving the RSA problem. We subsequently describe Rabin encryption (based on the square root problem), Goldwasser–Micali encryption (based on the quadratic residuosity problem), Blum–Goldwasser encryption (based on the square root problem), ElGamal encryption (based on the Diffie–Hellman problem) and Chor–Rivest encryption (based on a variant of the subset sum problem). The XTR encryption algorithm is essentially an efficient implementation of ElGamal encryption and is based on a tricky representation of elements in certain finite fields. The last encryption algorithm we discuss is the NTRU algorithm. It derives its security from a mixing system that uses the algebra . Attacks on NTRU based on the shortest vector problem are also known.

The basic key-agreement scheme is the Diffie–Hellman scheme. In order to prevent small-subgroup attacks on this scheme, one employs a technique known as cofactor expansion. We then explain unknown key-share attacks against key-agreement schemes. These attacks necessitate the use of authenticated key agreement schemes. The MQV algorithm is presented as an example of an authenticated key-agreement scheme.

Next come digital signature algorithms. Digital signatures may be classified in two broad categories: signature schemes with appendix and signature schemes with message recovery. In this book, we study only the signature schemes with appendix. As specific examples of signature schemes, we first explain RSA and Rabin signatures. Then, we present several variants of discrete-log-based signature schemes: ElGamal signatures, Schnorr signatures, Nyberg–Rueppel signatures, the digital signature algorithm (DSA) and its elliptic curve variant ECDSA. All the discrete-log (over finite fields)-based signature schemes have efficient XTR implementations. The NTRUSign algorithm is the last general-purpose signature scheme discussed in this section.

We then present a treatment of some special signature schemes. Blind signatures are created on messages unknown to the signer. Three blind signature schemes are described: Chaum, Schnorr and Okamoto–Schnorr schemes. An undeniable signature, on the other hand, requires an active participation of the signer at the time of verification and comes with a denial protocol that prevents a signer from denying a valid signature at a later time. The Chaum–Van Antwerpen undeniable signature scheme is based on the discrete-log problem, whereas the GKR scheme is based on the RSA problem.

A way to guarantee both authentication and confidentiality of a message is to sign the message and then encrypt the signed message. This involves two basic operations (signature generation and encryption). Zheng’s signcryption scheme combines these two primitives with a view to reducing both running time and message expansion.

The final topic we discuss in this chapter is entity authentication, a mechanism by means of which an entity can prove its identity to another. Here identity of an entity is considered synonymous with the possession of some secret information by the entity. Passwords are called weak authentication schemes, since the claimant has to disclose the secret straightaway to the verifier. A strong authentication scheme (also called a challenge–response scheme) does not reveal the secret to the verifier. We describe two strong authentication schemes; the first is based on encryption and the second on digital signatures. A way to establish mutual authentication between two entities is also presented. Challenge–response algorithms may be vulnerable to some attacks mounted by the verifier. A zero-knowledge protocol comes with a proof that during the authentication conversation no information is leaked to the verifier. Three zero-knowledge protocols are discussed: the Feige–Fiat–Shamir protocol, the Guillou–Quisquater protocol, and the Schnorr protocol.

Suggestions for Further Reading

Public-key cryptography was born from the seminal works of Diffie and Hellman [78] and Rivest, Shamir and Adleman [252]. Though still young, this area has induced much research in the last three decades. In this chapter, we have made an attempt to summarize some important cryptographic algorithms proposed in the literature. The original papers where these techniques have been introduced are listed below. We don’t plan to be exhaustive, but mention only the most relevant resources.

AlgorithmReference(s)
RSA encryption[252]
Rabin encryption[246]
Goldwasser–Micali encryption[117]
Blum–Goldwasser encryption[27]
ElGamal encryption[84]
Chor–Rivest encryption[54]
XTR encryption[170, 172, 171, 173, 289, 297]
NTRU encryption[130]
Identity-based encryption[267, 34, 35]
Diffie–Hellman key exchange[78]
Menezes–Qu–Vanstone key exchange[161]
RSA signature[252]
Rabin signature[246]
ElGamal signature[84]
Schnorr signature[263]
Nyberg–Rueppel signature[223, 224]
DSA[220]
ECDSA[141]
XTR signature[170, 172, 171, 173, 289, 297]
NTRUSign[110, 111, 128, 129, 131, 217]
Chaum blind signature[48, 49, 50]
Schnorr blind signature[263, 202]
Okamoto–Schnorr blind signature[227, 236]
Chaum–Van Antwerpen undeniable signature[51, 52, 53]
RSA undeniable signature[109, 187, 102, 186]
Signcryption[310, 311, 312]
Signcryption based on elliptic curves[313, 314]
Identity-based signcryption[178, 185]
Feige–Fiat–Shamir ZK protocol[90, 91]
Guillou–Quisquater ZK protocol[122]
Schnorr ZK protocol[263]

The Handbook of Applied Cryptography [194] is a single resource where most of the above algorithms have been discussed in good details. See Chapter 8 of this book for encryption algorithms, Chapter 11 for digital signatures and Chapter 10 for identification schemes.

There are several other (allegedly) intractable mathematical problems based on which cryptographic protocols can be built. Some of the promising candidates that we left out in the text are summarized below:

AlgorithmIntractable problem
LUC [284, 285, 286]RSA and ElGamal-like problems based on Lucas sequences
Goldreich–Goldwasser–Halevi [115]lattice-basis reduction
Patarin’s hidden field equationsolving multivariate polynomial
(HFE) [232]equations
EPOC/ESIGN [97, 228]factorization of integers p2q
McEliece encryption [190]decoding of error-correcting codes
Number field cryptography [38, 39]discrete log problem in class groups of quadratic fields
KLCHKP (Braid group cryptosystem) [148]Braid conjugacy problem

The Internet site http://www.tcs.hut.fi/~helger/crypto/link/public/index.html is a good place to start, for more information on these (and some other) cryptosystems. Also visit http://www.kisa.or.kr/technology/sub1/index-PKC.htm.

The obvious question that crops up now is that, given so many different cryptographic schemes, which one a user should go for.[5] There is no clear-cut answer to this question. One has to study the relative merits and demerits of the systems. If computational efficiency is what matters, we advocate users to go for NTRU schemes. Having said that, we must also add that the NTRU scheme is relatively new and has not yet withstood sufficient cryptanalytic attacks. Various attacks on NSS and NTRUSign cast doubt about the practical safety of applying such young schemes in serious applications.

[5] It is worthwhile to issue a warning to the readers. Many cryptographic algorithms (and also the idea of public-key cryptography) are/were patented. In order to implement these algorithms (in particular, for commercial purposes), one should take care of the relevant legal issues. We summarize here some of the important patents in this area. The list is far from exhaustive.

Patent No.

Covers

Patent holder

Date of issue

US 4,200,770

Diffie–Hellman key exchange (includes ElGamal encryption)

Stanford University

Apr 29, 1980

US 4,218,582

Public-key cryptography

Stanford University

Aug 19, 1980

US 4,405,829

RSA

MIT

Sep 20, 1983

US 5,231,668

DSA

USA, Secretary of Commerce

Jul 27, 1993

US 5,351,298

LUC

P. J. Smith

Sep 27, 1994

US 5,790,675

HFE

CP8 Transac (France)

Aug 4, 1998

EP 0963635A1 / WO 09836526

XTR

Citibank (North America)

Dec 15, 1999

Aug 20, 1998

US 6,081,597

NTRU

NTRU Cryptosystems, Inc.

Jun 27, 2000

EPOC/ESIGN

Nippon Telegraph and Telephone Corporation

Apr 17, 2001


Our mathematical trapdoors are not provably secure and this is where the problems begin. We have to rely on historical evidences that should not be collected too hastily. Slow as it is, RSA has stood the test of time, and has successfully survived more than twenty years of cryptanalytic attacks [29]. The risks attached to the fact that an unforeseen attack will break the system tomorrow, appear much less with RSA, compared to newer schemes that have enjoyed only little cryptanalytic studies. The hidden monomial system proposed by Imai and Matsumoto [188] was broken by Patarin [231]. As a by-product, Patarin came up with the idea of cryptosystems based on hidden field equations (HFE) [232]. No serious attacks on HFE are known till date, but as we mentioned earlier, only time will show whether HFE is going to survive.

Bruce Schneier asserts in his Crypto-gram news-letter (15 March 1999, http://www.counterpane.com/crypto-gram.html): No one can duplicate the confidence that RSA offers after 20 years of cryptanalytic review. A standard security review, even by competent cryptographers, can only prove insecurity; it can never prove security. By following the pack you can leverage the cryptanalytic expertise of the worldwide community, not just a handful of hours of a consultant’s time.

Twenty-odd years is definitely not a wide span of time in the history of evolution of our knowledge, but public-key cryptography is only as old as RSA is!

6. Standards

6.1Introduction
6.2IEEE Standards
6.3RSA Standards
 Chapter Summary
 Sugestions for Further Reading

In theory, there is no difference between theory and practice. But, in practice, there is.

—Jan L. A. van de Snepscheut

ECC curves are divided into three groups, weak curves, inefficient curves, and curves patented by Certicom.

—Peter Gutmann

Acceptance of prevailing standards often means we have no standards of our own.

—Jean Toomer (1894 – 1967)

6.1. Introduction

Public-key cryptographic protocols deal with sets like the ring of integers modulo n, the multiplicative group of units in a finite field or the group of points in an elliptic curve over a finite field. Messages that need to be encrypted or signed are, on the other hand, usually human-readable text or numbers or keys of secret-key cryptographic protocols, which are typically represented in computers in the form of sequences of bits (or bytes). It is necessary to convert such bit stings (or byte strings) to mathematical elements before the cryptographic algorithms are applied. This conversion is referred to as encoding. The reverse transition, that is, converting mathematical entities back to bit strings is called decoding.

If Alice and Bob were the only two parties involved in deploying public-key protocols, they could have agreed upon a set of private (not necessarily secret) encoding and decoding rules. In practice, however, when many entities interact over a public network, it is impractical, if not impossible, to have an individual encoding scheme for every pair of communicating parties. This is also unnecessary, because the security of the protocols comes from the encryption process and not from encoding. On the contrary, poorly designed encoding schemes may endanger the security of the underlying protocols.

We, therefore, need a set of standard ways of converting data between various logical formats. This promotes interoperability, removes ambiguities, facilitates simplicity in handling cryptographic data and thereby enhances the applicability and acceptability of public-key algorithms. IEEE (The Institute of Electrical and Electronics Engineers, Inc., pronounced eye-triple-e) and the RSA laboratories have published extensive documents standardizing data conversion and encoding for many popular public-key cryptosystems. Here we summarize the contents of some of these documents. This exposition is meant mostly for software engineers intending to develop cryptographic tool-kits that conform to the accepted standards.

6.2. IEEE Standards

In this section, we outline the first three of the drafts from IEEE, shown in Table 6.1. At the time of writing this book, these are the latest versions of the drafts available from IEEE. In future, these may be superseded by newer documents. We urge the reader to visit the web-site http://grouper.ieee.org/groups/1363/ for more up-to-date information. Also see the standard IEEE 1363–2000: Standards Specifications for Public-key Cryptography [134].

Table 6.1. IEEE drafts on public-key cryptography
DraftDateDescription
P1363 / D1312 November 1999Traditional public-key cryptography based on IFP, DLP and ECDLP
P1363a/D1216 July 2003Additional techniques on traditional public-key cryptography
P1363.1/D47 March 2002Lattice-based cryptography
P1363.2/D1525 May 2004Password-based authentication
P1363.3/D1May 2008Identity-based public-key cryptography

6.2.1. The Data Types

Public-key protocols operate on data of various types. The IEEE drafts specify only the logical descriptions of these data types. The realizations of these data types should be taken care of by individual implementations and are left unspecified.

Bit strings

A bit string is a finite ordered sequence a0a1 . . . al–1 of bits, where each bit ai can assume the value 0 or 1. The length of the bit string a0a1 . . . al–1 is l. The bit a0 in the bit string a0a1 . . . al–1 is called the leftmost or the first or the leading or the most significant bit, whereas the bit al–1 is called the rightmost or the last or the trailing or the least significant bit.

The order of appearance of the bits in a bit string is important, rather than the way the bits are indexed or named. That is to say, the most and least significant bits in a given bit string are uniquely determined by their positions of occurrences in the string, and not by the way the individual bits in the string are numbered. Thus, for example, if we call the bit string 01101 as a0a1a2a3a4, then the leading and trailing bits are a0 and a4 respectively. If we index the bits in the same bit string as a2a3a5a7a11, the first bit is a2 and the last bit is a11. Finally, for the indexing a5a4a3a2a1, the leftmost and rightmost bits are a5 and a1 respectively.

Octet strings

Though bits are the basic building blocks in computer memory, programs typically access memory in groups of 8 bits, known as octets. Thus, an octet is a bit string of length 8 and can have one of the 256 values 0000 0000 through 1111 1111. It is convenient to write an octet as a concatenation of two hexadecimal digits, the first (resp. second) one corresponding to the first (resp. last) 4 bits in the octet being treated as an 8-bit integer in base 2. For example, the octet 0010 1011 is represented by 2b. It is also often customary to treat an octet a0a1 . . . a7 as the integer (between 0 and 255, both inclusive) whose binary representation is a0a1 . . . a7.

An octet string is a finite ordered sequence of octets. The length of an octet string is the number of octets in the string. The leftmost (or first or leading or most significant) and the rightmost (or last or trailing or least significant) octets in an octet string are defined analogously as in the case of bit strings. These octets are dependent solely on their positions in the octet string and are independent of how the individual octets in the octet string are numbered.

Integers

Integers are the whole numbers 0, ±1, ±2, . . . . For cryptographic applications, one typically considers only non-negative integers. Integers used in cryptography may have binary representations requiring as many as several thousand bits.

Prime finite fields

Let p be a prime (typically, odd). The elements of are represented as integers 0, 1, . . . , p – 1 under the standard way of associating the integer with the congruence class [a]p in . Arithmetic operations in are the corresponding integer operations modulo the prime p.

Finite fields of characteristic 2

The elements of the field are represented as bit strings of length m. In order to provide the mathematical interpretation of these bit strings, we recall that is an m-dimensional -vector space. Let β0, . . . , βm–1 be an ordered basis of over . The bit string a0 . . . am–1 is to be identified with the element a0β0 + · · · + am–1βm–1, where the bit ai represents the element [ai]2 of . Selection of the basis β0, . . . , βm–1 renders a complete meaning to this representation and determines how arithmetic operations on these elements are to be performed. The following two cases are recommended.

For the polynomial-basis representation, one chooses an irreducible polynomial of degree m and represents as . Letting x denote the canonical image of X in one chooses the ordered basis β0 = xm–1, β1 = xm–2, . . . , βm–1 = 1. Arithmetic operations in under this representation are those of followed by reduction modulo the defining polynomial f(X). Choice of the irreducible polynomial f(X) is left unspecified in the IEEE drafts.

For the normal-basis representation, one selects an element which is normal over (see Definition 2.60, p 86), and takes the ordered basis β0 = θ = θ20, β1 = θ21, β2 = θ22, . . . , βm–1 = θ2m–1. Arithmetic in is carried out as explained in Section 2.9.3.

The IEEE draft P1363a also specifies a composite-basis representation of elements of , provided that m is composite. Let m = ds with 1 < d < m. One chooses an (ordered) polynomial or normal basis γ0, γ1, . . . , γs–1 of over . An element of is of the form a0γ0 + a1γ1 + · · · + as–1γs–1 and is represented by a0a1 . . . as–1, where each ai, being an element of , is represented by a bit string of length d. The interpretation of the representation of ai is dependent on how is represented. One can use a polynomial- or normal-basis representation of (over ), or even a composite-basis representation of over , if d happens to be composite with a non-trivial divisor d′.

Extension fields of odd characteristics

A non-prime finite field of odd characteristic is one with cardinality pm for some odd prime p and for some , m > 1. The field is represented as , where is an irreducible polynomial of degree m. An element of is then of the form α = am–1xm–1 + · · · + a1x + a0, where x := X + 〈f(X)〉 and where each ai is an element of , that is, an integer in the range 0, 1, . . . , p – 1. The element α is represented as an integer by substituting p for x, that is, as the integer (see the packed representation of Exercise 3.39). In order to interpret an integer between 0 and pm – 1 as an element of , one has to expand the integer in base p.

* Elliptic curves

An elliptic curve defined over a finite field is specified by two elements a, . Depending on the characteristic of this pair defines the following curves.

If char , 3, then 4a3 + 27b2 must be non-zero in and the equation of the elliptic curve is taken to be Y2 = X3 + aX + b.

For char , we must have b ≠ 0 in and we use the non-supersingular curve Y2 + XY = X3 + aX2 + b. Because of the MOV attack (Section 4.5.1), supersingular curves are not recommended for cryptographic applications.

Finally, if has characteristic 3, then both a and b must be non-zero in and the elliptic curve Y2 = X3 + aX2 + b is specified by (a, b).

* Elliptic curve points

A point on an elliptic curve defined over can be represented either in compressed or in uncompressed form. In the uncompressed form, one represents P as the pair (h, k) of elements of . The compressed form can be either lossy or lossless. In the lossy compressed form, P is represented by its X-coordinate h only. Such a representation is not unique in the sense that there can be two points on the elliptic curve with the same X-coordinate h. In applications where Y -coordinates of elliptic curve points are not utilized, such a representation can be used. In the lossless compressed form, one represents P as . There are two solutions (perhaps repeated) for Y for a given value h of X. The bit specifies which of these two values is represented. Depending on how the bit is computed, we have two different lossless compressed forms.

The LSB compressed form is applicable for odd prime fields or fields of even characteristic. For , the bit is taken to be the least significant (that is, rightmost) bit of k (treated as an integer). For , we have , if h = 0, whereas if h ≠ 0, then is the least significant bit of the element kh–1 treated as an integer via the FE2I conversion primitive described in Section 6.2.2.

The SORT compressed form is used for q = pm, m > 1. Let P′ = (h, k′) be the opposite of P = (h, k), that is, One converts k and k′ to integers and using the FE2I primitive and sets .

One may also go for a hybrid representation of the elliptic curve point P = (h, k), in which information for both the compressed and the uncompressed representations for P are stored, that is, P is stored as with computed by one of the methods (LSB or SORT) described above.

* Convolution polynomial rings

For NTRU public-key cryptosystems, we work in the ring . We denote as usual. An element of R is a polynomial a(x) = a0 + a1x + a2x2 + · · · + an–1xn–1 with , and is represented by the ordered n-tuple of integers (a0, a1, . . . , an–1). Addition (resp. subtraction) in R is simply component-wise addition (resp. subtraction), whereas multiplication of a(x) = a0 + a1x + · · · + an–1xn–1 and b(x) = b0 + b1x + · · · + bn–1xn–1 gives c(x) = c0 + c1x + · · · + cn–1xn–1, where ajbk (see Section 5.2.8). The IEEE draft P1363.1 designates elements of R as ring elements.

It is customary to deal with polynomials in R with small coefficients. If all the coefficients of are known to be from {0, 1}, it is convenient to represent a(x) as the bit string a0a1 . . . an–1 instead of as an n-tuple of integers. In this case, a(x) is called a binary ring element or simply a binary element.

6.2.2. Conversion Among Data Types

The IEEE drafts P1363 and P1363.1 specify algorithms for converting data among the formats discussed above. The standardized data conversion primitives are summarized in Figure 6.1. Though these drafts support elliptic curve cryptography, it is not specified how data representing elliptic curves can be converted to data of other types (like octet strings and bit strings).

Figure 6.1. IEEE P1363 data types and conversions


We now provide a brief description of the data conversion primitives at a logical level. The implementation details depend on the representations of the data types and are left out here.

Converting bit strings to octet strings (BS2OS)

A bit string a0a1 . . . al–1 can be broken up in groups of eight bits and packed into octets. But we run with difficulty, if the length of the input bit string is not an integral multiple of 8. We have to add extra bits in order the make the length of the augmented bit string an integral multiple of 8. This can be done is several ways and in this context a standard convention needs to be adopted. The IEEE drafts prescribe the following rules:

  1. Every extra bit added must be the zero bit.

  2. Add the minimal number of extra bits.

  3. Add the extra bits, if any, to the left.[1]

    [1] At the time of writing this book there is a serious conflict between the latest drafts of P1363 and P1363.1 from IEEE. The former asks to add extra bits to the left, the latter to the right. One of the authors of this book raised this issue in the discussion group stds-p1363-discuss maintained by IEEE and was notified that in the next version of the P1363.1 document this conflict would be resolved in favour of P1363.

In order to see what these rules mean, let a0a1 . . . al–1 be a bit string of length l to be converted to the octet string A0A1 . . . Ad–1. The length of the output octet string must be d = ⌈l/8⌉. 8dl zero bits should be added to the left of the input bit string in order to create the augmented bit string 0 . . . 0a0a1 . . . al–1 whose length is 8d. Now, we start from the left and pack blocks of consecutive eight bits in A0, A1, . . . , Ad–1. Thus, we have A0 = 0 . . . 0a0 . . . ak–1, A1 = ak . . . ak+7, . . . , Ad–1 = ak+8(d–2) . . . ak+8(d–2)+7, where k = 8 – (8dl). Note that if l is already a multiple of 8, then 8dl = 0, that is, no extra bits need to be added.

As an example, consider the input bit string 01110 01101011 of length 13. The output octet string should be of length ⌈13/8⌉ = 2. Padding gives the augmented bit string 00001110 01101011. The first octet in the output octet string will then be 00001110, that is, 0e; and the second octet will be 01101011, that is, 6b.

Converting octet strings to bit strings (OS2BS)

The OS2BS primitive is designed to ensure that if we convert an octet string generated by BS2OS, we should get back the original bit string (that is, the input to BS2OS) with which we started. Suppose that we want to convert an octet string A0A1 . . . Ad–1. Let us write the bits of Ai as ai,0ai,1 . . . ai,7. The desired length l of the output bit string has to be also specified. If d ≠ ⌈l/8⌉, the procedure OS2BS reports error and stops. If d = ⌈l/8⌉, we consider the bit string

a0,0a0,1 . . . a0,7a1,0a1,1 . . . a1,7 . . . ad–1,0ad–1,1 . . . ad–1,7

of length 8d. If the leftmost 8dl bits of this flattened bit string are not all zero, OS2BS should quit after reporting error. Otherwise, the trailing l bits of the flattened bit string is returned.

The reader can check that when 0e 6b and l = 13 are input to OS2BS, it returns the bit string 01110 01101011. (See the example in connection with BS2OS.) Notice also that for this input octet string, OS2BS reports error if and only if a value l ≥ 17 or l ≤ 11 is supplied as the desired length of the output bit string.

Converting integers to bit strings (I2BS)

Let a non-negative integer n be given. The I2BS primitive outputs a bit string of length l representing n. If n ≥ 2l, this conversion cannot be done and the primitive reports error and quits. If n < 2l, we write the binary representation of n as

n = al–12l–1 + al–22l–2 + · · · + a12 + a0 with .

Treating each ai as a bit[2], I2BS returns the bit string al–1al–2 . . . a1a0. One or more leading bits of the binary representation of n may be zero. There is no limit on how many leading zero bits are allowed during the conversion. In particular, the integer 0 gets converted to a sequence of l zero bits for any value of l supplied.

[2] Each ai is logically an integer which happens to assume one of two possible values: 0 and 1. A bit, on the other hand, is a quantity that can also assume only two possible values. Traditionally, the values of a bit are also denoted by 0 and 1. But one has the liberty to call these values off and on, or false and true, or black and white, or even armadillo and platypus. To many people, bit is an abbreviation for binary digit which our ais logically are. To others, binit is a safer and more individualistic acronym for binary digit. For I2BS, we identify the two concepts.

A request to I2BS to convert n = 2357 = 211 + 28 + 25 + 24 + 22 + 20 with l = 12 returns 1001 00110101, one with l = 18 returns 00 00001001 00110101 and finally one with l ≤ 11 reports failure. Note that for neater look we write bit strings in groups of eight and grouping starts from the right. This convention reflects the relationship between bit strings and octet strings, as mentioned above.

Converting bit strings to integers (BS2I)

The primitive BS2I converts the bit string a0a1 . . . al–1 to the integer a02l–1 + a12l–2 + · · · + al–22 + al–1, where we again identify a bit with an integer (or a binary digit). As an illustrative example, the bit string 1001 00110101 (or 00 00001001 00110101) gets converted to the integer 211 + 28 + 25 + 24 + 22 + 20 = 2357. The null bit string (that is, the one of zero length) is converted to the integer 0.

Converting integers to octet strings (I2OS)

In order to convert a non-negative integer n to an octet string of length d, we write the base-256 expansion of n as

n = Ad–1256d–1 + Ad–2256d–2 + · · · + A1256 + A0,

where each and can be naturally identified with an octet. I2OS returns the octet string Ad–1Ad–2 . . . A1A0. Note that the above representation of n to the base 256 is possible if and only if n < 256d. If n ≥ 256d, I2OS should return failure. Like bit strings, an arbitrary number of leading zero octets are allowed.

Consider the integer 2357 = 9 × 256 + 53. The two-digit hexadecimal representations of 9 and 53 are 09 and 35 respectively. Thus, a call of I2OS on this n with d = 3 (resp. d = 2, resp. d = 1) returns 00 09 35 (resp. 09 35, resp. failure).

Converting octet strings to integers (OS2I)

Let an octet string A0A1 . . . Ad–1 be given. Each Ai can be identified with a 256-ary digit. OS2I returns the integer A0256d–1 + A1256d–2 + · · · + Ad–2256 + Ad–1. If d = 0, the integer 0 should be output.

Converting field elements to octet strings (FE2OS)

In the IEEE P1363 jargon, a field element is an element of the finite field , where q is a prime or an integral power of a prime. We want to convert an element to an octet string. Depending on the value of q, we have two cases:

If char is odd, β is represented as an integer in {0, 1, . . . , q – 1}. FE2OS converts β to an octet string of length ⌈log256 q⌉ by calling the primitive I2OS.

If q = 2m, β is represented as a bit string of length m. The primitive BS2OS is called to convert β to an octet string.

Converting octet strings to field elements (OS2FE)

Assume that an octet string is to be converted to an element of the finite field . Again we have two possibilities depending on q.

If is of odd characteristic, the primitive OS2I is called to convert the given octet string to an integer. This integer is returned as the field element.

If q = 2m, one calls the primitive OS2BS with the given octet string and with the length m supplied as inputs. The resulting bit string is returned by OS2FE. If OS2BS reports error, so should do OS2FE too.

Converting field elements to integers (FE2I)

Let and the integer equivalent of β be sought for. If q is odd, then β is already represented as an integer (in {0, 1, . . . , q – 1}) and is itself output. If q = 2m, one first converts β to an octet string by FE2OS and subsequently converts this octet string to an integer by calling the primitive OS2I.

* Converting elliptic curve points to octet strings (EC2OS)

The point at infinity (on an elliptic curve over ) is defined by an octet string comprising a single zero octet only. So let P = (h, k) be a finite point. The EC2OS primitive produces an octet string PO = P CHK which is the concatenation of a single octet PC with octet strings H and K representing h and k respectively. The values of PC and K depend on the type of compression used. One has , where

S = 1 if and only if the SORT compression is used.

U = 1 if and only if uncompressed or hybrid form is used.

C = 1 if and only if compressed or hybrid form is used.

= if compression is used, it is 0 otherwise.

The first four bits of PC are reserved for (possible) future use and should be set to 0000 for this version of the standard. H is the octet string of length ⌈log256 q⌉ obtained by converting h using FE2OS. If the compressed form is used, K is the empty octet string, whereas if uncompressed or hybrid form is used, we have K = FE2OS(k, ⌈log256 q⌉). Finally, for the lossy compression we have PC = 0000 0001, H = FE2OS(h, ⌈log256 q⌉) and K is empty. Table 6.2 summarizes all these possibilities. Here, l := ⌈log256 q⌉, and p is an odd prime.

Table 6.2. The EC2OS primitive
RepresentationPCHKq
uncompressed0000 0100FE2OS(h, l)FE2OS(k, l)All
LSB compressedFE2OS(h, l)Emptyp, 2m
LSB hybridFE2OS(h, l)FE2OS(k, l)p, 2m
SORT compressedFE2OS(h, l)Empty2m, pm
SORT hybridFE2OS(h, l)FE2OS(k, l)2m, pm
lossy compression0000 0001FE2OS(h, l)EmptyAll
point at infinity 0000 0000EmptyEmptyAll

* Converting octet strings to elliptic curve points (OS2EC)

The OS2EC data conversion primitive takes as input an octet string PO, the length l = ⌈log256 q⌉ and the method of compression. If PO contains only one octet and that octet is zero, is output. Otherwise, the elliptic curve point P = (h, k) is computed as follows. OS2EC decomposes PO = PCHK, with PC the first octet and with H an octet string of length l. If PC does not match with the method of compression, OS2EC returns error. Otherwise, it uses OS2FE to compute the field element h. If no or hybrid compression is used, the Y -coordinate k is also computed using OS2FE on K. If (h, k) is not a point on the elliptic curve, error is reported. For the LSB or SORT compression, the Y -coordinate is computed using h and . If the hybrid scheme is used and , OS2EC halts after reporting error. If all computations are successful till now, the point (h, k) is output.

Note that the checks for (h, k) being on the curve or for the equality are optional and may be omitted. For the lossy compression scheme, the Y -coordinate k is not necessarily uniquely determined from the input octet string PO. In that case, any of the two possibilities is output.

* Converting ring elements to octet strings (RE2OS)

Ring elements are elements of the convolution polynomial ring and can be identified as polynomials with integer coefficients and of degrees < n. The element (where ) is represented by the n-tuple of integers (a0, a1, . . . , an–1). The IEEE draft P1363.1 assumes that the coefficients ai are available modulo a positive integer β ≤ 256. But then each ai is an integer in {0, 1, . . . , β – 1} and can be naturally encoded by a single octet. RE2OS, upon receiving a(x) as input, outputs the octet string a0a1 . . . an–1 of length n.

An example: Let n = 7 and β = 128. The ring element a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0) is converted to the octet string 02 0b 00 65 7f 47 00.

* Converting octet strings to ring elements (OS2RE)

Let an octet string a0a1 . . . an–1 of length n be given, which we want to convert to an element of . Once again a modulus β ≤ 256 is assumed, so that each octet ai can be viewed as an integer reduced modulo β. Making the natural identification of ai with an integer, the polynomial is output. Thus, for example, the octet string 02 0b 00 65 7f 47 00 gets converted to the ring element 2 + 11x + 101x3 + 127x4 + 71x5.

* Converting ring elements to bit strings (RE2BS)

The RE2BS primitive assumes that the modulus β is a power of 2, that is, β = 2t for some positive integer t ≤ 8. Let a ring element be given, where each . One applies the I2BS primitive on each ai to generate the bit string ai,0ai,1 . . . ai,t–1 of length t. The concatenated bit string

a0,0a0,1 . . . a0,t–1 a1,0a1,1 . . . a1,t–1 . . . an–1,0an–1,1 . . . an–1,t–1

of length nt is then returned by RE2BS.

As before, take the example of n = 7, β = 128 = 27 (so that t = 7) and a(x) = 2 + 11x + 101x3 + 127x4 + 71x5 = (2, 11, 0, 101, 127, 71, 0). The coefficients 2, 11, 0, . . . should first be converted to bit strings of length 7 each, that is, 2 gives 0000010, 11 gives 0001011 and so on. Thus, the bit string output by RE2BS will be 0000010 0001011 0000000 1100101 1111111 1000111 0000000. Note that here we have shown the bits in groups of 7 in order to highlight the intermediate steps (the outputs from I2BS). With the otherwise standard grouping in blocks of 8, the output bit string looks like 0 00001000 01011000 00001100 10111111 11100011 10000000 and hence transforms to the octet string 00 08 58 0c bf d3 80 by an invocation of BS2OS. This example illustrates that RE2BS followed by BS2OS does not necessarily give the same output as the direct conversion RE2OS, even when every underlying parameter (like β) remains unchanged.

* Converting bit strings to ring elements (BS2RE)

Once again we require the modulus β to be a power 2t of 2. Let a bit string a0a1 . . . al–1 of length l be given, and we want to compute the ring element a(x) equivalent to this. If l is not an integral multiple of t, the algorithm should quit after reporting error. Otherwise we let l = nt for some , and repeatedly call the BS2I primitive on the bit strings a0a1 . . . at–1, atat+1 . . . a2t–1, . . . , anttantt+1 . . . ant–1 to get the integers α0, α1, . . . , αn–1 respectively. The polynomial a(x) = α0 + α1x + · · · + αn–1xn–1 is then output.

We urge the reader to verify that BS2RE with β = 128 and the bit string

0000010 0001011 0000000 1100101 1111111 1000111 0000000

as input produces the ring element .

* Converting binary elements to octet strings (BE2OS)

A binary (ring) element is an element with each . One can convert a(x) to an octet string A0A1 . . . Al–1 of any desired length l as follows. We denote the bits in the octet Ai as Ai,7Ai,6 . . . Ai,0. Here, the index of the bits increases from right to left.

First we rewrite the polynomial a(x) as one of degree 8l – 1, that is, as a(x) = a0 + a1x + · · · + a8l–1x8l–1. If n ≤ 8l, this can be done by setting an = an+1 = · · · = a8l–1 = 0. On the other hand, if n > 8l and one or more of the coefficients a8l, a8l+1, . . . , an–1 are non-zero (that is, 1), the above rewriting of a(x) cannot be done and BE2OS terminates after reporting failure.

When the above rewriting of a(x) becomes successful, one sets the bits of the output octets as A0,0 := a0, A0,1 := a1, . . . , A0,7 := a7, A1,0 := a8, A1,1 := a9, . . . , A1,7 := a15, A2,0 := a16, A2,1 := a17, . . . , A2,7 := a23, . . . , Al–1,0 := a8l–8, Al–1,1 := a8l–7, . . . , Al–1,7 := a8l–1.

As an example, take n = 20 and consider the binary element . First let l = 1. Rewriting a(x) as a polynomial of degree 7 is not possible, since the coefficients of x10 and x12 are 1; so BE2OS outputs error in this case. If l = 2, then the output octet string will be 00000111 00010100, that is, 07 14. For l ≥ 3, the first two octets will be 07 and 14 as before, whereas the 3rd through l-th octet will be 00.

The BE2OS primitive can be quite effective for reducing storage requirements. For example, the polynomial a(x) of degree 12 of the previous paragraph, viewed as an element of , can be encoded in just two octets. Of course, by specifying l ≥ 3 one may add l – 2 trailing zero octets, if one desires. On the other hand, RE2OS requires exactly 200 octets, whereas RE2BS with β = 128 followed by BS2OS requires exactly ⌈(200 × 7)/8⌉ = 175 octets for storing the same a(x).

* Converting octet strings to binary elements (OS2BE)

Assume that an octet string A0A1 . . . Al–1 of length l is given and the equivalent binary element in is to be determined. As in the case with BE2OS, we index the bits in the octet Ai as Ai = Ai,7Ai,6 . . . Ai,0. Now, consider the polynomial a(x) = a0 + a1x + a2x2 + · · · + a8l–1x8l–1, where a8i+j = Ai,j. If n ≥ 8l, we set a8l = a8l+1 = · · · = an–1 = 0 and output the binary element . On the other hand, if n < 8l and an = an+1 = · · · = a8l–1 = 0, then equals the polynomial a(x) and is returned. Finally, if n < 8l and if any of the coefficients an, an+1, . . . , a8l–1 is non-zero, then OS2BE returns error.[3]

[3] In this case, it still makes full algebraic sense to treat a(x) as an element of R, though not in the canonical representation.

For example, assume that the octet string 07 14 is given as input to OS2BE. If n ≤ 12, the algorithm outputs error, because the polynomial a(x) in this case has degree 12. For any n ≥ 13, the binary element is returned.

6.3. RSA Standards

The public-key cryptography standards (PKCS) [254] refer to a set of standard specifications proposed by the RSA Laboratories. A one-line description of each of these documents is given in Table 6.3. In the rest of this section, we concentrate only on the documents PKCS #1 and #3.

Table 6.3. Public-key cryptography standards from the RSA Laboratories
DocumentDescription
PKCS #1RSA encryption and signature
PKCS #2Merged with PKCS #1
PKCS #3Diffie–Hellman key exchange
PKCS #4Merged with PKCS #1
PKCS #5Password-based cryptography
PKCS #6Extension of X.509 public-key certificates
PKCS #7Syntax of cryptographic messages
PKCS #8Syntax and encryption of private keys
PKCS #9Attribute types for use in PKCS #6, #7, #8 and #10
PKCS #10Syntax for certification requests
PKCS #11Cryptoki, an application programming interface (API)
PKCS #12Syntax of transferring personal information (private keys, certificates and so on)
PKCS #13Elliptic curve cryptography (under preparation)
PKCS #15Syntax for cryptographic token (like integrated circuit card) information

6.3.1. PKCS #1

PKCS #1 describes RSA encryption and RSA signatures. In this section, we summarize Version 2.1 (dated 14 June 2002) of the standard. This version specifies cryptographically stronger encoding procedures compared to the older versions. More specifically, the optimal asymmetric encryption procedure (OAEP [18]) for RSA encryption is incorporated in the Version 2.0 of PKCS #1, whereas the new probabilistic signature scheme (PSS [19]) is introduced in Version 2.1. This latest draft also includes encryption and signature schemes compatible with older versions (1.5 and 2.0). However, adoption of the new algorithms is strongly recommended for enhanced security.

RSA keys

PKCS #1 Version 2.1 introduces the concept of multi-prime RSA, in which the RSA modulus n may have more than two prime divisors. For RSA encryption and decryption to work properly, we only need n to be square-free (Exercise 4.1). Using u > 2 prime divisors of n increases efficiency and does not degrade the security of the resulting system much, as long as u is not very large. More specifically, if T is the time for RSA private-key operation without CRT, then the cost of this operation with CRT is approximately T/u2 (neglecting the cost of CRT combination).

So an RSA modulus is of the form n = r1r2 . . . ru with u ≥ 2 and with pairwise distinct primes r1, . . . , ru. For the sake of conformity with the older versions of the standard, the first two primes are given the alternate special names p := r1 and q := r2. PKCS #1 does not mention any specific way of choosing the prime divisors ri of n, but encourages use of primes that make factorization of n difficult.

An RSA public exponent is an integer e, 3 ≤ en – 1, with gcd(e, λ(n)) = 1, where λ(n) := lcm(r1 – 1, r2 – 1, . . . , ru – 1). An RSA public key is a pair (n, e) with n and e chosen as above.

The RSA private key corresponding to (n, e) can be stored in one of the two formats. In the first format, one maintains the pair (n, d) with the private exponent d so chosen as to make ed ≡ 1 (mod λ(n)). In the second format, one stores the five quantities (p, q, dP, dQ, qInv) and, if u > 2, the triples (ri, di, ti) for each i = 3, . . . , u. The meanings of these quantities are as follows:

p=r1
q=r2
dPe–1 (mod p – 1)
dQe–1 (mod q – 1)
qInvq–1 (mod p)
die–1 (mod ri – 1)
ti(r1 . . . ri–1)–1 (mod ri)

For the sake of consistency, one should store the CRT coefficient (mod r2), that is, p–1 (mod q). In order to ensure compatibility with older versions of PKCS, q–1 (mod p) is stored instead.

RSA key operations

The RSA public-key operation is used to encrypt a message or to verify a signature. The PKCS draft calls these primitives RSAEP (encryption primitive) and RSAVP1 (verification primitive). It is implemented in a straightforward manner as in Algorithm 6.1.

Algorithm 6.1. RSA encryption/signature verification primitive

Input: RSA public key (n, e) and message/signature representative x.

Output: The ciphertext/message representative y.

Steps:

if (x < 0) or (xn) { Return “Error: representative out of range”. }

y := xe (mod n).

The RSA decryption or signature-generation primitive is called RSADP or RSASP1 and is given in Algorithm 6.2. The operation depends on the format in which the private key K is stored. The correctness of the primitive is left to the reader as an easy exercise.

Algorithm 6.2. RSA decryption/signature generation primitive

Input: RSA private key K and the ciphertext/message representative y.

Output: The message/signature representative x.

Steps:

if (y < 0) or (y ≥ n) { Return “Error: representative out of range”. }
if (K is stored in the first format) {
   x := yd (mod n).
else {  /* K is stored in the second format */
   x1 := ydP (mod p).
   x2 := ydQ (mod q).
   h := (x1 – x2)qInv (mod p).
   x := x2 + qh.
   if (u > 2) {
      R := r1.
      for i = 3, . . . , u {
         xi := ydi (mod ri).
         R := R × ri–1.
         h := (xi – x)ti (mod ri).
         x := x + Rh.
      }
   }
}

RSAES–OAEP encryption scheme

The encryption scheme RSAES–OAEP is based on the optimal asymmetric encryption procedure (OAEP) proposed by Bellare and Rogaway [18, 98]. In this procedure, a string of length slightly less than the size of the modulus n is probabilistically encoded using a hash function and the encoded message is subsequently encrypted. The probabilistic encoding makes the encryption procedure semantically secure and (provably) provides resistance against chosen-ciphertext attacks. Under this scheme, an adversary can produce a ciphertext, only if she knows the corresponding plaintext. Such an encryption scheme is called plaintext-aware. Given an ideal hash function, Bellare and Rogaway’s OAEP is plaintext-aware.

RSAES–OAEP uses a label L which is hashed by a hash function H. One may take L as the empty string. Other possibilities are not specified in the PKCS draft. SHA-1 (or SHA-256 or SHA-384 or SHA-512) is the recommended hash function. The hash values (in hex) of the empty string under these hash functions are given in Table 6.4.

Table 6.4. Hash values of the empty string
FunctionHash of the empty string
SHA-1da39a3ee 5e6b4b0d 3255bfef 95601890 afd80709
SHA-256e3b0c442 98fc1c14 9afbf4c8 996fb924 27ae41e4 649b934c a495991b 7852b855
SHA-38438b060a7 51ac9638 4cd9327e b1b1e36a 21fdb711 14be0743 4c0cc7bf 63f6e1da 274edebf e76f65fb d51ad2f1 4898b95b
SHA-512cf83e135 7eefb8bd f1542850 d66d8007 d620e405 0b5715dc 83f4a921 d36ce9ce 47d0d13c 5d85f2b0 ff8318d2 877eec2f 63b931bd 47417a81 a538327a f927da3e

The length of the hash output (in octets) is denoted by hLen. For SHA-1, hLen = 20. The RSA modulus n is assumed to be of octet length k. The octet length mLen of the input message M must be ≤ k–2hLen–2. RSAES–OAEP uses a mask-generation function designated as MGF (see Algorithm 6.11 for a recommended realization).

Algorithm 6.3 describes the RSA–OAEP encryption scheme which employs the EME–OAEP encoding scheme described in Algorithm 6.4. The use of a random seed makes the encryption probabilistic. We use the notation ‖ to denote string concatenation and ⊕ to denote bit-wise XOR.

Algorithm 6.3. RSA–OAEP encryption scheme

Input: The recipient’s public key (n, e), the message M (an octet string of length mLen) and an optional label L whose default value is the empty string.

Output: The ciphertext C of octet length k.

Steps:

/* Check lengths */

if (L is longer than what H can handle) { Return “Error: label too long”. }

/* For example, for SHA-1 the input must be of length ≤ 261 – 1 octets. */

if (mLen > k – 2hLen – 2) { Return “Error: message too long”. }

/* Encode M to EM (EME–OAEP encoding scheme) */

EM := EME-OAEP-encode(M, L)./* Algorithm 6.4 */
/* RSA encryption */ 
m := OS2I(EM)./* Convert octet string to integer */
c := RSAEP((n, e), m)./* RSA encryption primitive */
C := I2OS(c, k)./* Convert integer back to octet string */

The matching decryption operation is shown in Algorithm 6.5 which calls the EME–OAEP decoding procedure of Algorithm 6.6. The only error message that the decryption and decoding algorithms issue is decryption error. This is to ensure that an adversary cannot distinguish between different kinds of errors, because such an ability of the adversary may lead her to guess partial information about the decryption process and thereby mount a chosen-ciphertext attack.

Algorithm 6.4. RSA–OAEP encoding scheme

Input: The message M of octet length mLen, the label L.

Output: The EME–OAEP encoded message EM.

Steps:

lHash := H(L).

Generate the padding string PS with kmLen – 2hLen – 2 zero octets.

Generate the data block DB := lHashPS ‖ 01 ‖ M.

Let seed := a random string of length hLen octets.

Generate the data-block mask dbMask := MGF(seed, khLen – 1).

Generate the masked data-block maskedDB := DBdbMask.

Generate mask for seed seedMask := MGF(maskedDB, hLen).

Generate the masked seed maskedSeed := seedseedMask.

Generate the encoded message EM := 00 ‖ maskedSeedmaskedDB.

Algorithm 6.5. RSA–OAEP decryption scheme

Input: The recipient’s private key K, the ciphertext C to be decrypted and an optional label L (the default value of which is the null string).

Output: The decrypted message M.

Steps:

if (the length of L is more than the limitation of H) or (the length of C is not k octets)
        or (k < 2hLen + 2) { Return “Decryption error”. }

c := OS2I(C)./* Convert octet string to integer */
m := RSADP(K, c)./* RSA decryption primitive */
EM := I2OS(m, k)./* Convert integer back to octet string */
M := EME-OAEP-decode(EM, L)./* Algorithm 6.6 */

Algorithm 6.6. RSA–OAEP decoding scheme

Input: The encoded message EM and the label L.

Output: The EME–OAEP decoded message M.

Steps:

lHash := H(L).
Write EM = Y ‖ maskedSeed ‖ maskedDBwhere Y is a single octet,
       maskedSeed is a string of length hLen octets and
       maskedDB is a string of length k – hLen – 1 octets.
seedMask := MGF(maskedDBhLen).
seed := maskedSeed ⊕ seedMask.
dbMask := MGF(seedk – hLen – 1).
DB := maskedDB ⊕ dbMask.
Try to decompose DB = lHash′ ‖ PS ‖ 01 ‖ Mwhere lHash′ is of length hLen
       and PS is a (possibly empty) padding string comprising octets 00 only.
if (DB cannot be decomposed as above) or (lHash′ ≠ lHash) or
       (Y ≠ 00) { Return “Decryption error”. }

RSASSA–PSS signature scheme with appendix

RSASSA–PSS employs the probabilistic signature scheme proposed by Bellare and Rogaway [19]. Under suitable assumptions about the hash function and the mask-generation function, the RSASSA–PSS scheme produces secure signatures which are also tight in the sense that forging RSASSA–PSS signatures is computationally equivalent to inverting RSA.

Algorithm 6.7. RSASSA–PSS signature generation

Input: The message M (an octet string) to be signed, the private key K of the signer.

Output: The signature S (an octet string of length k).

Steps:

EM := EMSA–PSS–encode(M, modBits – 1)./* Encode by Algorithm 6.8 */
m := OS2I(EM)./* Convert octet string to integer */
s := RSASP1(m)./* RSA signature generation primitive */
S := I2OS(s, k)./* Convert integer back to octet string */

Algorithm 6.8. RSASSA–PSS encoding

Input: The message M to be encoded (an octet string), the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9.

Output: The encoded message EM, an octet string of length emLen := ⌈emBits/8⌉.

Steps:

if (M is longer than what H can handle) { Return “Error: message too long”. }

Generate the hashed message mHash := H(M).

if (emLen < hLen + sLen + 2) { Return “Encoding error”. }

Let salt := a random string of length sLen octets.

Generate the salted message M′ := 00 00 00 00 00 00 00 00 ‖ mHashsalt.

Generate the hashed salted message mHash′ := H(M′).

Generate the padding string PS with emLensLenhLen – 2 zero octets.

Generate the data block DB := PS ‖ 01 ‖ salt.

Generate the data block mask dbMask := MGF(mHash′, emLenhLen – 1).

Generate the masted data block maskedDB := DBdbMask.

Set to 0 the leftmost 8emLenemBits bits of the leftmost octet of maskedDB.

Compute EM := maskedDBmHash′ ‖ bc.

RSASSA–PSS signature generation (Algorithm 6.7) uses the EMSA–PSS encoding method (Algorithm 6.8). Verification (Algorithm 6.9) uses the EMSA–PSS decoding method (Algorithm 6.10). We assume that k is the octet length of the RSA modulus n. Let modBits denote the bit length of n. The encoded message is of length emLen = ⌈(modBits – 1)/8⌉ octets. The probabilistic behaviour of the encoding scheme is incorporated by the use of a random salt, the octet length of which is sLen. A hash function H that produces hash values of octet length hLen is employed.

Algorithm 6.9. RSASSA–PSS signature verification

Input: The message M, the signature S to be verified and the signer’s public key (n, e).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S)./* Convert octet string to integer */
m := RSAVP1((n, e), s)./* RSA signature verification primitive */
EM := I2OS(m, emLen)./* Convert integer back to octet string */
status := EMSA–PSS–decode(M, EM, modBits – 1)./* Algorithm 6.10 */

if (status is “consistent”) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.10. RSASSA–PSS decoding

Input: The message M (an octet string), the encoded message EM (an octet string of length emLen = ⌈emBits/8⌉) and the maximum bit length emBits of OS2I(EM). One should have emBits ≥ 8hLen + 8sLen + 9.

Output: Decoding status: “consistent” or “inconsistent”.

Steps:

if (M is longer than what H can handle) { Return “inconsistent”. }
Generate the hashed message mHash := H(M).
if (emLen < hLen + sLen + 2) { Return “inconsistent”. }
Try to decompose EM = maskedDB ‖ mHash′ ‖ Ywhere
       maskedDB is an octet string of length emLen – hLen – 1,
       mHash′ is an octet string of length hLenand Y is a single octet.
if (Y ≠ bc) or (the leftmost 8emLen – emBits bits of the leftmost octet of
       maskedDB are not all 0) { Return “inconsistent”. }
dbMask := MGF(mHash′, emLen – hLen – 1).
DB := maskedDB ⊕ dbMask.
Set to 0 the leftmost 8emLen – emBits bits of the leftmost octet of DB.
Try to decompose DB = PS ‖ 01 ‖ saltwhere PS is a string with
       emLen – sLen – hLen – 2 zero octets, and salt is of length sLen octets.
if (the above decomposition is unsuccessful) { Return “inconsistent”. }
Set M′ := 00 00 00 00 00 00 00 00 ‖ mHash ‖ salt.
if (H(M′) = mHash) { Return “consistent”. } else { Return “inconsistent”. }

A mask-generation function

A mask-generation function (MGF1) is specified in the PKCS #1 draft. It is based on a hash function H. The mask-generation function is deterministic in the sense that its output is completely determined by its input. However, the (provable) security of OAEP and PSS schemes are based on the pseudorandom nature of the output of the mask-generation function. This means that any part of the output should be statistically independent of the other parts. MGF1 derives this pseudorandomness from that of the underlying hash function H.

Algorithm 6.11. Mask-generation function MGF1

Input: The seed mg f Seed (an octet string) and the desired octet length maskLen of the output mask. One requires maskLen ≤ 232hLen, where hLen is the octet length of the hash function output.

Output: An octet string mask of length maskLen.

Steps:

if (maskLen > 232hLen) { Return “Error: mask too long”. }
Initialize T to the empty octet string.
for i = 0, 1, . . . , ⌈maskLen/hLen⌉ – 1 {
    I := I2OS(i, 4).
    T := T ‖ H(mgfSeed ‖ I).
}
mask := the leftmost maskLen octets of T.

The RSA encryption scheme of PKCS #1, Version 1.5

The older encryption scheme RSAES–PKCS1–v1_5 is no longer recommended, since this scheme is not plaintext-aware, that is, with high probability, an adversary can generate ciphertexts without knowing the corresponding plaintexts. This allows the adversary to mount chosen-ciphertext attacks. The new drafts of PKCS #1 include this old scheme for backward compatibility. Encryption and decryption for RSAES–PKCS1–v1_5 are given in Algorithms 6.12 and 6.13. Here, k is the octet length of the modulus.

Algorithm 6.12. RSA–PKCS1 encryption scheme

Input: The recipient’s public key (n, e) and the message M (an octet string).

Output: The ciphertext C which is an octet string of length k.

Steps:

if (mLen > k – 11) { Return “Error: message too long”. }
Generate a padding string PS of length k – mLen – 3 ≥ 8 octets consisting of
       random non-zero octets.
Generate the encoded message EM := 00 ‖ 02 ‖ PS ‖ 00 ‖ M.

m := OS2I(EM)./* Convert octet string to integer */
c := RSAEP((n, e), m)./* RSA encryption primitive */
C := I2OS(c, k)./* Convert integer back to octet string */

Algorithm 6.13. RSA–PKCS1 decryption scheme

Input: The recipient’s private key K and the ciphertext C (an octet string).

Output: The plaintext message M (an octet string of length ≤ k – 11).

Steps:

if (the length of the ciphertext is not k octets) { Return “decryption error”. }

c := OS2I(C)./* Convert octet string to integer */
m := RSADP(K, c)./* RSA decryption primitive */
EM := I2OS(m, k)./* Convert integer back to octet string */

Try to decompose EM = 00 ‖ 02 ‖ PS ‖ 00 ‖ M, where PS is an octet string of length ≥ 8 and containing only non-zero octets.

if (the above decomposition is unsuccessful) { Return “decryption error”. }

The RSA signature scheme of PKCS #1, Version 1.5

The older RSA signature scheme RSASSA–PKCS1–v1_5 is not known to have security loopholes. (Nevertheless, the provably secure PSS scheme is recommended for future applications.) RSASSA–PKCS1–v1_5 uses EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16). The signature generation and verification procedures are given in Algorithms 6.14 and 6.15. Here, k denotes the octet length of the modulus n.

The EMSA–PKCS1–v1_5 message encoding procedure (Algorithm 6.16) uses a hash function H. Although a member of the SHA family is recommended for future applications, MD2 and MD5 are also supported for compliance with older application. An octet string hashAlgo is used whose value depends on the underlying hash algorithm and is given in Table 6.5.

Table 6.5. The string hashAlgo used by EMSA–PKCS1–v1_5
FunctionThe string hashAlgo
MD230 20 30 0c 06 08 2a 86 48 86 f7 0d 02 02 05 00 04 10
MD530 20 30 0c 06 08 2a 86 48 86 f7 0d 02 05 05 00 04 10
SHA-130 21 30 09 06 05 2b 0e 03 02 1a 05 00 04 14
SHA-25630 31 30 0d 06 09 60 86 48 01 65 03 04 02 01 05 00 04 20
SHA-38430 41 30 0d 06 09 60 86 48 01 65 03 04 02 02 05 00 04 30
SHA-51230 51 30 0d 06 09 60 86 48 01 65 03 04 02 03 05 00 04 40

Algorithm 6.14. RSA–PKCS1 signature generation

Input: The signer’s private key K and the message M to be signed (an octet string).

Output: The signature S (an octet string of length k).

Steps:

Encode M to EM := EMSA–PKCS1–v1_5(M, k)./* Algorithm 6.16 */
m := OS2I(EM)./* Convert octet string to integer */
s := RSASP1(K, m)./* RSA signature generation primitive */
S := I2OS(s, k)./* Convert integer back to octet string */

Algorithm 6.15. RSA–PKCS1 signature verification

Input: The signer’s public key (n, e), the message M (an octet string) and the signature S to be verified (an octet string of length k).

Output: Verification status of the signature.

Steps:

if (the length of S is not k octets) { Return “Signature not verified”. }

s := OS2I(S)./* Convert octet string to integer */
m := RSAVP1((n, e), s)./* RSA signature verification primitive */
EM′ := I2OS(m, k)./* Convert integer back to octet string */
Encode M to EM := EMSA–PKCS1–v1_5(M, k)./* Algorithm 6.16 */

if (EM = EM) { Return “Signature verified”. }

else { Return “Signature not verified”. }

Algorithm 6.16. EMSA–PKCS1 encoding

Input: The message M (an octet string), the intended length emLen of the encoded message. One requires emLentLen + 11, where tLen is the octet length of hashAlgo plus the octet length of the hash output.

Output: The encoded message EM (an octet string of length emLen).

Steps:

if (M is longer than what H can handle) { Return “Error: message too long”. }
Compute the hash value mHash := H(M).
Let T := hashAlgo ‖ mHash.
/* Let tLen be the octet length of T*/
if (emLen < tLen + 11) { Return “Error: encoded message length too short”. }
Generate a padding string PS of length emLen – tLen – 3 ≥ 8 octets each
      having the hexadecimal value ff.
Set EM := 00 ‖ 01 ‖ PS ‖ 00 ‖ T.

6.3.2. PKCS #3

PKCS #3 describes the Diffie–Hellman key-exchange algorithm. The draft assumes the existence of a central authority which generates the domain parameters that include a prime p of octet length k, an integer g satisfying 0 < g < p and optionally a positive integer l. The integer g need not be a generator of , but is expected to be of sufficiently large multiplicative order modulo p. The integer l denotes the bit length of the private Diffie–Hellman key of an entity. Values of l ≪ 8k can be chosen for efficiency. However, for maintaining a desired level of security l should not be too small. Since the central authority determines p, g (and l), individual users need not bother about the generation of these parameters.

During a Diffie–Hellman key-exchange interaction of Alice with Bob, Alice performs the steps described in Algorithm 6.17. Bob performs an identical operation which is omitted here.

Algorithm 6.17. PKCS3 Diffie–Hellman key-exchange scheme

Input: p, g and optionally l.

Output: The shared secret SK (an octet string of length k).

Steps:

Alice generates a random .

/* If l is specified, one should have 2l–1 ≤ x < 2l. */

Alice computes y := gx (mod p).

Alice converts y to an octet string PV := I2OS(y, k).

Alice sends the public value PV to Bob.

Alice receives Bob’s public value PV′.

Alice converts PVto the integer y′ := OS2I(PV′).

Alice computes z := (y′)x (mod p) (with 0 < z < p).

Alice transforms z to the shared secret SK := I2OS(z, k).

Chapter Summary

In this chapter, we describe some standards for representation of cryptographic data in various formats and for conversion of data among different formats. We also present some standard encoding and decoding schemes that are applied before encryption and after decryption. These standards promote easy and unambiguous interfaces with the cryptographic primitives described in the previous chapter.

The IEEE P1363 range of standards defines several data types: bit strings, octet strings, integers, prime finite fields, finite fields of characteristic 2, extension fields of odd characteristics, elliptic curves, elliptic curve points and polynomial rings. The IEEE drafts also prescribe standard ways of converting data among these formats. For example, the primitive BS2OS converts a bit string to an octet string, the primitive FE2I converts a finite-field element to an integer.

We subsequently mention some of the public-key cryptography standards (PKCS) propounded by RSA Laboratories. Draft PKCS #1 deals with RSA encryption and signature. In addition to the standard RSA moduli of the form pq, it also suggests possibility of using multi-prime RSA, that is, moduli which are products of more than two (distinct) primes. The draft recommends use of the optimal asymmetric encryption procedure (OAEP). This probabilistic encryption scheme provides provable security against chosen-ciphertext attacks. A probabilistic signature scheme is also advocated for use. These probabilistic schemes call for using a mask-generation function (MGF). A concrete realization of an MGF is also provided. Draft PKCS #3 standardizes the Diffie–Hellman key exchange algorithm.

Suggestions for Further Reading

The P1363 class of preliminary drafts [134] published by IEEE and the PKC standards [254] from RSA Security Inc. are available for free download from Internet sites. However, IEEE’s published standard 1363-2000 is to be purchased against a fee. In addition to the data types and data conversion primitives described in this chapter, the IEEE drafts (P1363, P1363a, P1363.1 and P1363.2) provide encryption/decryption and signature generation/verification primitives and also several encryption and signature schemes based on these primitives. These schemes are very similar to the algorithms that we described in Chapter 5. So we avoided repetition of the same descriptions here. Elaborate encoding procedures are described in the PKCS drafts, but for only RSA-and Diffie–Hellman-based systems. We have reproduced the details in this chapter. The remaining PKCS drafts deal with topics that this book does not directly deal with. A good exception is PKCS #13 that talks about elliptic curve cryptography. This draft is not ready yet; when it is, it may be consulted to learn about the RSA Laboratories’ standards on elliptic-curve cryptography.

At present, the different families of standards do not seem to have mutually conflicting specifications. The IEEE has a (free) mailing list for promoting the development and improvement of the IEEE P1363 standards, via e-mail discussions.

Other Internet Standards include the Federal Information Processing Standards or FIPS [221] from NIST, and RFCs (Request for Comments) from the Internet Engineering Task Force or (IETF) [135].

7. Cryptanalysis in Practice

7.1Introduction
7.2Side Channel Attacks
7.3Backdoor Attacks
 Chapter Summary
 Sugestions for Further Reading

A man cannot be too careful in the choice of his enemies.

—Oscar Wilde (1854–1900), The Picture of Dorian Gray, 1891

If you reveal your secrets to the wind you should not blame the wind for revealing them to the trees.

—Kahlil Gibran (1883–1931)

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.

—Charles Antony Richard Hoare

7.1. Introduction

The security of public-key cryptographic protocols is based on the apparent intractability of solving some computational problems. If one can factor large integers efficiently, one breaks RSA. In that sense, seeking for good algorithms to solve these problems (like factoring integers) is part of cryptanalysis. Proving that no poly-time algorithm can break RSA enhances the status of the security of the protocol from assumed to provable. On the other hand, developing a poly-time algorithm for breaking RSA (or for factoring integers) makes RSA (and many other protocols) unusable. A temporary set-back to our existing cryptographic tools as it is, it enriches our understanding of the computational problems. In short, breaking the trapdoors of public-key cryptosystems is of both theoretical and practical significance.

But research along these mathematical lines is open-ended. A desperate cryptanalyst may not wait indefinitely for a theoretical negotiation. She tries to find loopholes in the systems, that she can effectively exploit to gain secret information.

A cryptographic protocol must be implemented (in software or hardware) before it can be used. Careless implementations often supply the loopholes that cryptanalysts wait for. For example, a software implementation of a public-key system may allow the private key to be read only from a secure device (a removable medium, like CDROM), but may make copies of the key in the memory of the machine where the decryption routine is executed. If the decryption routine does not lock and eventually flush the memory holding the key, a second user having access to the machine can simply read off the secrets.

Software and hardware implementations often tend to leak out secrets at a level much more subtle than the example just mentioned. A public-key algorithm is a known algorithm and involves a sequence of well-defined steps dictated by the private key. Each step requires its private share of execution time and power consumption. Watching the decrypting device carefully during a private-key operation may reveal information about the exact sequence of basic steps in the algorithm. Random hardware faults during a private-key operation may also compromise security. Such attacks are commonly dubbed as side-channel attacks.

Let us now look at another line of attack. Every user of cryptography is not expected to implement all the routines she uses. On the contrary, most users run precompiled programs available from third parties. How will a user assess the soundness of the products she is using, that is, who will guarantee that there are no (intentional or unintentional) security snags in the products? The key generation software available from a malicious software designer may initiate a clandestine e-mail every time a key pair is generated. It is also possible that a private key supplied by such a program is generated from a small predefined set known to the designer. Even when private keys look random, they need not come with the desired unpredictability necessary for cryptographic usage. Such attacks during key generation are called backdoor attacks.

In short, public-key cryptanalysis at present encompasses trapdoors, backdoors and side channels. The trapdoor methods have already been discussed in Chapter 4. In this chapter, we concentrate on the other attacks on public-key systems.

7.2. Side-Channel Attacks

Side-channel attacks refer to a class of cryptanalytic tools for determining a private key by measuring signals (like timing, power fluctuation, electromagnetic radiation) from or by inducing faults in the device performing operations involving the private key. In this section, we describe three methods of side-channel cryptanalysis: timing attack, power attack and fault attack.

7.2.1. Timing Attack

Paul C. Kocher introduced the concept of side-channel cryptanalysis in his seminal paper [155] on timing attacks. Though not unreasonable, timing attacks are somewhat difficult to mount in practice.

Details of the attack

The private-key operation in many cryptographic systems (like RSA or discrete-log-based systems) is usually a modular exponentiation of the form

y := xd (mod n),

where d is the private key. The private-key procedure may involve other overheads (like message decoding), but the running time of the routine is usually dominated by and so can be approximated by the time of the modular exponentiation.

Assume that this exponentiation is carried out by a square-and-multiply algorithm known to Carol, the attacker. For example, suppose that Algorithm 3.9 is used. Each iteration of the for loop involves a modular squaring followed conditionally by a modular multiplication. The multiplication is done in an iteration if and only if the corresponding bit ei in the exponent is 1. Thus, an iteration runs slower if ei = 1 than if ei = 0. If Carol could measure the timing of each individual iteration of the for loop, she would correctly guess most (if not all) of the bits in the exponent. But it is unreasonable to assume that an attacker can collect such detailed timing data. Moreover, if Algorithm 3.10 is used, these detailed data do not help much, because in this case the timing of an individual iteration of the for loop can at best differentiate between the two cases ei = 0 and ei ≠ 0. There are 2t – 1 non-zero values for each ei.

However, it is not difficult to think of a situation where the attacker can measure, to a reasonable accuracy, the total time of the exponentiation. In order to guess d, Carol requires the times of the modular exponentiations for several different values of x, say x1, . . . , xk, all known to her. (Note that xi may be messages to be signed or intercepted ciphertexts.) The same exponent d is used for all these exponentiations. Let Ti be the time for computing (mod n), as measured by Carol. We may assume that all these k exponentiations are carried out on the same machine using the same routine.

Kocher considers the attack on the exponentiation routine of RSAREF, a cryptography toolkit available from the RSA Laboratories. This routine implements Algorithm 3.10 with t = 2. For the sake of convenience, the algorithm is reproduced below. We may assume that the exponent has an even number of bits—if not, pad a leading zero.

Algorithm 7.1. RSAREF’s exponentiation routine

Input: , and d = (d2l–1d2l–2 · · · d1d0)2.

Output: y := xd (mod n).

Steps:

 (1)  z1 := x.
 (2)  z2 := z1x (mod n).
 (3)  z3 := z2x (mod n).
 (4)  y := 1.
 (5)  for j = l - 1, . . . , 0 {
 (6)     y := y2 (mod n).
 (7)     y := y2 (mod n).
 (8)     if ((d2j+1d2j)2 ≠ 0) {
 (9)         y := yz(d2j+1d2j)2 (mod n).
(10)     }
(11)  }

Every step of the above algorithm runs in a time dependent on the operands. For example, the modular multiplication in Step (9) takes time dependent on the operands y and z(d2j+1d2j)2. The variation in the timing depends on the implementation of the modular arithmetic routines and also on the machine’s architecture. However, we make the assumption that for fixed operands each step requires a constant time on a given machine (or on identical machines). This is actually a loss of generality, since the running time of a complex step (like modular multiplication or squaring) for fixed operands may vary for various reasons like process scheduling, availability of cache, page faults and so on. It may be difficult, perhaps impossible, for an attacker to arrange for herself a verbatim emulation of the victim’s machine at the time when the latter performed the private-key operations. Let us still proceed with our assumption, say by conceiving of a not-so-unreasonable situation where the effects of these other factors are not sizable enough.

We use the subscript i to denote the i-th private-key operation for 1 ≤ ik. The entire routine takes time Ti for the i-th exponentiation, that is, for the input xi. This measurement may involve some (unknown) error which we denote by ei. The first four steps are executed only once during each call and take a total time of pi (precomputation time). The for loop is executed l times. We ignore the time needed to maintain the loop (like decrementing j) and also the time taken by the if statement in Step (8). Let si,j and ti,j be the times taken respectively by Steps (6) and (7), when the loop variable (j) assumes the value j. If Step (9) is executed, we denote by mi,j the time taken by this step, else we set mi,j := 0. It follows that

Equation 7.1


where the index in the sum decreases from l – 1 to 0 in steps of 1. Carol does not know this break-up (that is, the explicit values of ei, si,j, ti,j and mi,j), but she can make an inductive guess in the following way.

Carol manages a machine and a copy of the exponentiation software both identical to those of the victim. She then successively guesses the secret bit pairs d2l–1d2l–2, d2l–3d2l–4, d2l–5d2l–6 and so on. Assume that at some stage Carol has correctly determined the exponent bits d2j+1d2j for j = l–1, l–2, . . . , j′+1. Initially j′ = l–1. Using this information Carol computes d2j +1d2j as follows. Carol’s knowledge at this stage allows her to measure pi and si,j, ti,j, mi,j for j = l – 1, . . . , j′ + 1 — she simply runs Algorithm 7.1 on xi. Carol then enters the loop with j = j′. The squaring operations are unconditional. Carol has the exact operands as the victim for the squaring steps. So Carol also measures si,j and ti,j.

The bit pair d2j′+1d2j (considered as a binary integer) can take any one of the four values g = 0, 1, 2, 3. Carol measures the time of Step (9) for each of the four choices of g and adds this time to the time taken by the algorithm so far, in order to obtain:

Equation 7.2


Kocher observed that the distribution of Ti, i = 1, . . . , k, is statistically related to that of only for the correct guess g. In order to see how, we subtract Equation (7.2) from Equation (7.1) to get:

Equation 7.3


Let us assume that the error term ei is distributed like a random variable E. Similarly suppose that each multiplication (resp. squaring) has the distribution of a random variable M (resp. S). Taking the variance of Equation (7.3) over the values i = 1, 2, . . . , k and assuming that the sample size k is so large that the sample variances are very close to the variances of the respective random variables, we obtain:

Equation 7.4


where λ denotes the number of times Step (9) is executed for j = j′ – 1, . . . , 0. Note that λ is dependent on the private key and not on the arguments to the exponentiation routine. For the correct guess g, we have and so

On the other hand, for an incorrect guess g we have:

if one of mi,j or is zero, or

if both mi,j and are non-zero. (Recall that Var(αX + βY) = α2 Var(X) + β2 Var(Y) for any real α, β.)

Calculation of the sample variances of for the four choices of g gives Carol a handle to determine (or guess) the correct choice. Carol simply takes the g for which the variance is minimum. This is the fundamental observation that makes the timing attack work.

Of course, statistical irregularities exist in practice, and the approximation of the actual variances by the sample variances introduces errors in Equation (7.4). These errors are of particular concern for large values of j′, that is, during the beginning of the attack. However, if an incorrect guess is made at a certain stage, this is detected soon with high probability, as Carol proceeds further. Suppose that an erroneous guess of d2j″ + 1d2j has been made for some j″ > j′. This means that the values of y are different from the actual values starting from the iteration of the loop with j = j″ – 1. (We may assume that most, if not all, xi ≠ 1.) We then do not have a cancellation of the timings for j = j″ – 1, . . . , j′. More correctly, if the guesses for j = l – 1, . . . , j″ + 1 are correct and the first error occurs at j = j″, then denoting the subsequent timings by one gets

Equation 7.5


Since each of the square and multiplication operations takes y as an operand, the original timings and the measured timings (the ones with hat) behave like independent variables and, therefore, taking the variance of Equation (7.5) yields

for some λ′ depending on the private key and on the previous guesses, but independent of the current guess g. In other words, Carol loses a meaningful relation of Var with the correctness of the current guess. Once Carol notices this, she backtracks and changes older guesses until the expected behaviour is restored. Thus, the timing attack comes with an error detection and correction strategy.

An analysis done by Kocher (neglecting E and assuming normal distributions for S and M) shows that Carol needs k = O(l) for a good probability of success.

Countermeasures

There are several ways in which timing attacks can be prevented.

7.2.2. Power Analysis

In connection with timing attacks, we mentioned that if an adversary were able to measure the timing of each iteration of the square-and-multiply loop during an RSA (or discrete-log-based) private-key exponentiation, she could guess the bits in the key quite efficiently from only a few timing measurements. But it is questionable if such detailed timing data can be made available.

Now, think of a situation where Carol can measure patterns of power consumption made by the decrypting (or signing) device during one or more private-key operations with Alice’s private key. If Alice carries out the private-key operations in her personal workstation, it is difficult for Carol to conduct such measurements. So assume that Alice is using a smart card with a device to which Carol has a control. Carol inserts a small resistor in series with the line which drives Alice’s smart card. The power consumed by the smart-card circuit is roughly proportional to the current through the resistor. Measuring the voltage across the resistor (and multiplying by a suitable factor) Carol can observe the power consumed by Alice’s decryption device. Carol has to use a power measuring device that takes readings at a high frequency (100 MHz to several GHz depending on the budget of Carol). A set of power measurements obtained during a cryptographic operation is called a power trace. We now study how power traces can reveal Alice’s secrets.

Simple power analysis (SPA)

The individual steps in a private-key operation may be nakedly exposed in a power trace. This is, in particular, the case when different steps consume different amounts of power and/or take different times. Obtaining information about the operation of the decrypting device and/or the secrets by a direct interpretation of power traces is referred to as simple power analysis or SPA in short.

As an example of SPA, consider an implementation of RSA exponentiation using the naive square-and-multiply Algorithm 3.9. Here, the most power-consuming operations are modular squaring and modular multiplication. Modular multiplication typically runs slower than modular squaring. Also modular multiplication requires two different operands to fetch from the memory, whereas modular squaring requires only one operand. Thus, a multiplication operation has more and longer power requirements than a squaring operation.

A hypothetical[1] SPA trace during a portion of an RSA private-key operation is shown in Figure 7.1. Each spike in the trace corresponds to either a square or a multiplication operation. Let us assume that the power consumption is measured with sufficient resolution, so that no spike is missed. Since multiplication runs longer (and requires more operands) than squaring, multiplication spikes are wider than squaring spikes.

[1] SPA traces from real-life experiments on smart cards, as reported in several references, look similar to this. We, however, generated the trace using a random number generator. Absolute conformity to reality is not always crucial for the purposes of illustration.

Figure 7.1. Simulated SPA trace for a portion of an RSA private-key operation


Let us denote a squaring operation by S and a multiplication operation by M. We observe that Alice’s smart card performs the sequence

SMSMSSMSSSSMSSSMSS

of operations during the measurement interval shown. Since multiplication in an iteration of the loop is skipped if and only if the corresponding bit in the exponent is zero, we can group the operations as

(SM)(SM)(S)(SM)(S)(S)(S)(SM)(S)(S)(SM)(S)(S.

This, in turn, reveals the bit string 110100010010 in Alice’s private key.

Effective as it appears, SPA, in practice, does not pose a huge threat to the security of conventional cryptographic systems. Using algorithms for which power traces do not bear direct relationships with the bits of the private key largely reduces risks of fruitful SPA. The inefficient repeated square-and-multiply Algorithm 7.2 always performs a multiplication after squaring and thereby eliminates chances of a successful SPA.

Algorithm 7.2. SPA-resistant exponentiation

Input: , and the private key d = (dl–1 · · · d1d0)2.

Output: y := xd (mod n).

Steps:

y := 1.
for (j = l – 1, . . . , 0) {
    t0 := y2 (mod n).
    t1 := t0x (mod n).
    y := tdj.
}

Using the (more efficient) Algorithm 7.1 also frustrates SPA. Some chunks of two successive 0 bits are anyway revealed by power traces collected during the execution of this algorithm. But, for a decently large and random private key, this still leaves Carol with many unknown bits to be guessed. Note, however, that neither of the three remedies suggested to thwart the timing attack on Algorithm 7.1 seems to be effective in the context of SPA. Delays normally do not consume much power (unless some power-intensive dummy computations fill up the delays). Also, the masking of (x, y) by (u, v) fails to produce any alteration in the power consumption pattern during exponentiation.

If some private-key algorithm has unavoidable branchings due to individual bits in the private key, SPA can prove to be a notorious botheration.

Differential power analysis (DPA)

A carefully designed algorithm (like Algorithm 7.2) does not reveal key information from a simple observation of power traces. Moreover, the observed power traces may be corrupted by noise to an extent where SPA is not feasible. In such cases, differential power analysis (DPA) often helps the cryptanalyst reduce the effects of noise and exploit subtle correlation of power consumption patterns with specific bits in the operands. DPA requires availability of power traces from several private-key operations with the same key.

Consider the SPA-resistant Algorithm 7.2. Suppose that k power traces P1(t), . . . , Pk(t) for the computations of (mod n), i = 1, . . . , k, are available to Carol, that the ciphertexts x1, . . . , xk are known to Carol and that d = (dl–1 · · · d1d0)2. Carol successively guesses the bits dl–1, dl–2, dl–3, . . . of the exponent. Suppose that Carol has correctly guessed dj for j = l – 1, . . . , j′ + 1. She now uses DPA to guess dj.

Let e := (dl–1dl–2 · · · dj′ + 1)2. At the beginning of the for loop with j = j′ the variable y holds the value xe modulo n. The loop computes x2e and x2e+1 and assigns y the appropriate value. If dj = 0, then in the next iteration the loop computes x4e and x4e+1, whereas if dj = 1, then in the next iteration the loop computes x4e+2 and x4e+3. It follows that the algorithm handles the value x4e if and only if dj = 0.

For each i = 1, . . . , k, Carol computes (mod n). Carol then chooses a particular bit position (say, the least significant bit) and considers the bit bi of zi at this position. We make the assumption that there is some subsequent step (or substep) in the implementation for which the average power consumption Π0 for b = 0 is different from the average power consumption Π1 for b = 1.[2]

[2] The exact step which exhibits differential bias toward an individual bit value is dependent on the implementation. If the implementation does not provide such a step, the attack cannot be mounted in this way. Initially, the DPA was proposed for DES, a symmetric encryption algorithm, in which such a dependence is clearly available. With asymmetric-key encryption, such a strong dependence of the power, consumed by a step, on an individual bit value is not obvious. One may, however, use other dividing criteria, like low versus high Hamming weight (that is, number of one-bits) in the operand, which bear more direct relationships with power consumption.

Carol partitions {1, . . . , k} into two subsets:

I0:={i | bi = 0},
I1:={i | bi = 1}.

Carol computes the average power traces and and subsequently the differential power trace

First, let dj = 0. In this case, the routine handles and so the power consumption at some time τ is correlated to the bit bi of . At any other instant, the power consumption is uncorrelated to this particular bit value. Therefore, if the sample size is sufficiently large and if the measurement noise has mean at zero, we have:

On the other hand, if dj = 1, the value never appears in the execution of the algorithm and so at every time t the power consumption is uncorrelated to the particular bit of and so we expect

Δ(t) ≈ 0for all t.

Figure 7.2 illustrates the two cases.[3] If the differential power trace has a distinct spike, the guess dj = 0 is correct. So by observing the existence or otherwise of a spike, Carol determines whether dj = 0 or dj = 1.

[3] Once again, these are hypothetical traces obtained by random number generators.

Figure 7.2. Simulated DPA trace for a portion of an RSA private-key operation

(a) for the correct guess
(b) for an incorrect guess


The number k of samples required for a good probability of success depends on the bias Π1–Π0 relative to the measurement noise. We assume that . If the noise has a variance of σ2, then by the central limit theorem the noise in each average power trace or has at each t an approximate variance 2σ2/k, and so in the differential power trace Δ(t) the noise has an approximate variance 4σ2/k. In order that the bias Π1 –Π0 stands out against the noise, we require , say, , that is, k ≥ 64σ2/(Π1 – Π0)2.

Countermeasures

Several countermeasures can be adopted to prevent DPA, both in the software level and in the hardware level.

Paul Kocher asserts: DPA highlights the need for people who design algorithms, protocols, software, and hardware to work closely together when producing security products.

7.2.3. Fault Analysis

We finally come to the third genre of side-channel cryptanalysis. We investigate how hardware faults occurring during private-key operations can reveal the secret to an adversary. There are situations where a single fault suffices. Boneh et al. [30] classify hardware faults into three broad categories.

  1. Transient faults These are faults caused by random (unpredictable) hardware malfunctioning. These may be the outcomes of occasional flips of bit values in registers or of temporary erroneous outputs from logic or arithmetic circuits in the processor. These faults are called transient, because they are not repeated. It is rather difficult to detect such (silent) faults.

  2. Latent faults These are faults generated by some permanent malfunctioning and/or bugs inherent in the processor. For example, the floating-point bug in the early releases of the Pentium processor may lead to latent faults. Latent faults are permanent, that is, repeated, but may be difficult to locate in practice.

  3. Induced faults An induced fault is deliberately caused by an adversary. For example, a short surge of electromagnetic radiation may cause a smart card to malfunction temporarily. A malicious adversary can induce such temporary hardware faults to extract secret information from the smart card. It is, however, difficult to induce deliberate faults in a remote workstation.

Although induced faults appear to be the ones to guard against most seriously, the other two types of faults are also of relevance. Consider a certifying authority signing many messages. Transient and/or unknown latent faults may reveal the authority’s private key to a user who can later utilize this knowledge to produce false certificates.

Fault attack on RSA based on CRT

Consider the implementation of RSA private-key operation based on the CRT combination of the values obtained by exponentiation modulo the prime divisors p and q of the modulus n (Algorithm 5.4). Suppose that m is a message to be signed and s := md (mod n) the corresponding signature, where d is the signer’s private key. The CRT-based implementation computes s1 := s (mod p) and s2 := s (mod q). Assume that due to hardware fault(s) exactly one of s1 and s2 is wrongly computed. Say, s1 is incorrectly computed as . The corresponding faulty signature is denoted by . We assume that the CRT combination of and s2 is correctly computed.

An adversary requires the faulty signature and the correct signature s on the same message m in order to obtain the factor q of n. To see how, note that (mod p), ss1 (mod p) and (mod p), so that (mod p), that is, . On the other hand, (mod q), that is, . Therefore,

This is how the fault analysis of Boneh et al. [30] works.

Arjen K. Lenstra et al. [142] point out that the knowledge of the faulty signature alone reveals the secret divisor q, that is, one does not require the genuine signature s on m. The verification key e of the signer is publicly known. Since RSA exponentiation is bijective, (mod n). However, (mod q), and so (mod p). It follows that

Fault attack on RSA without CRT

Now, consider an implementation of RSA decryption based on a single exponentiation modulo n. For such an implementation, several models of fault attacks have been proposed. These attacks are less practical than the attack on CRT-based RSA just mentioned, because now one requires several faulty signatures in order to deduce the entire private key. Here, we present an attack due to Bao et al. [17].

As usual, the RSA modulus is n = pq and the signer’s key pair is (e, d). Consider a valid signature s on a message m. Let d = (dl–1 · · · d1d0)2 be the binary representation of the private key. Consider the powers:

sim2i (mod n)for i = 0, 1, . . . , l – 1.

The signature s can be written as:

We assume that the attacker knows m and s and hence can compute si and modulo n for i = 0, . . . , l – 1. There is no harm in assuming that the message m is randomly chosen. (We may assume that randomly chosen integers are invertible modulo n, because encountering a non-invertible non-zero integer by chance is a stroke of unimaginable good luck and is tantamount to knowing the factors of n.)

In order to guess a bit of d, the attacker induces a fault in exactly one of the bits dj, changing it from dj to . The position j is random, that is, not under the control of the attacker. Now, the algorithm outputs the faulty signature

and so

A repetition in the values sl–1, . . . , s0, , . . . , modulo n is again an incident of minuscule probability. Hence the attacker can uniquely identify the bit position j and the bit value dj in d by comparing with these 2l values.

Statistical analysis implies that the attacker needs to repeat this procedure about l log l times (on same or different (m, s) pairs) in order to ensure that the probability of identifying all the bits of d is at least 1/2.

Fault attack on the Rabin digital signature algorithm

Recall from Algorithm 5.34 that the Rabin signature algorithm uses CRT to combine s1 (mod p) and s2 (mod q). Thus, the attack on CRT-based RSA, described earlier, is applicable mutatis mutandis to the Rabin signature scheme. The computation of the square roots s1 and s2 demands the major portion of the running time of the routine. Inducing a fault during the execution is, therefore, expected to affect exactly one of s1 and s2, as desired by the attacker.

Fault attack on DSA

Bao et al. [17] propose a fault attack on the digital signature algorithm (DSA). We work with the notations of Algorithm 5.43 and Algorithm 5.44, except that, for maintaining uniformity in this section, we use m (instead of M) to denote the message to be signed. The (public) parameters are p, a prime divisor r of p – 1 of length 160 bits and an element of multiplicative order r. The signer’s DSA key pair is (d, gd(mod p)) with 1 < d < r.

Suppose that during the generation of a DSA signature, an attacker induces a fault in exactly one bit position of d changing it to . The routine generates the faulty signature , where

(d′, gd) being the session key pair (not mutilated). As in the DSA signature-verification scheme, the attacker computes the following:

For each i = 0, . . . , l – 1 (where the bit length of d is l), the attacker also computes

Assume that the j-th bit dj of d is altered. If dj = 0, and so

On the other hand, if dj = 1, then and a similar calculation shows that

Thus, the attacker computes and for all j = 0, . . . , l – 1 and notices a unique match (with s). This discloses the position j and the corresponding bit dj.

Fault attack on the ElGamal signature scheme

A fault attack similar to that on the DSA scheme can be mounted on the ElGamal signature scheme. We here propose an alternative method proposed by Zheng and Matsumoto [315]. The novelty in their approach is that it performs the cryptanalysis of the ElGamal signature scheme by inducing fault on the pseudorandom bit generator of the signer’s smart card.

Algorithms 5.36 and 5.37 describe the ElGamal signature scheme on a general cyclic group G. Here, we restrict our attention to the specific group (though the following exposition works perfectly well for a general G). The parameters are a prime modulus p and a generator g of . The signer’s key-pair is (d, gd(mod p)) for some d, 2 ≤ dp – 2.

In order to generate a signature (s, t) on a message m, a random session key d′ is generated and subsequently the following computations are carried out:

sgd (mod p),
td–1(H(m) – dH(s)) (mod p – 1).

Zheng and Matsumoto attack the generation of the session key d′. They propose the possibility that an abnormal physical stress (like low voltage) forces a constant output d0 for d′ from the pseudorandombit generator (software or hardware) in the smart card. First, assume that this particular value d0 is known a priori to the attacker. She then lets a message m generate a signature (s, t) with the session secret d0. The private key d is then immediately available from the equation:

dH(s)–1(H(m) – d0t) (mod p – 1).

Here, we assume that H(s) is invertible modulo p – 1.

If d0 is not known a priori, the attacker generates two signatures (s1, t1) and (s2, t2) on messages m1 and m2 respectively. Since d′ is always d0, we have s1 = s2 = s0, say. One can then easily calculate

d0 ≡ (t1t2)–1(H(m1) – H(m2)) (mod p – 1),

which, in turn, yields

dH(s0)–1(H(m1) – d0t1) (mod p – 1).

Fault attack on the Feige–Fiat–Shamir identification protocol

Let us conclude our repertoire of fault attack examples by explaining an attack on the FFS zero-knowledge identification protocol. This attack is again from Boneh et al. [30].

We use the notations of Algorithm 5.69. A modulus n = pq, p, , is first chosen (by Alice or by a trusted third party). Alice selects random x1, . . . , and random bits δ1, . . . , δt, computes (mod n), publishes (y1, . . . , yt) and keeps (x1, . . . , xt) secret.

During an identification session with Bob, Alice generates a random commitment and sends to Bob the witness w := c2 (mod n). (For simplicity, we take γ of Algorithm 5.69 to be 0.) When Alice is waiting for a challenge from Bob, a fault occurs in her smart card changing the commitment c to c + E. Assume that the fault is at exactly one bit position, that is, E = ±2j for some , l being the bit length of c (or of n). This fault may be purposely induced by Bob with the malicious intention of guessing Alice’s secret (x1, . . . , xt).

Bob then generates a random challenge as usual. Upon reception of this challenge Alice computes and sends to Bob the faulty response

The knowledge of now aids Bob to obtain the product as follows. First, note that

so that

for some .

There are only 4l possible values of (E, δ). Bob tries all these possibilities one by one. To simplify matters we assume that only one value of (E, δ) with E of the special form ±2j and with satisfies the last congruence. In practice, the existence of two (or more) solutions for (E, δ) is an extremely improbable phenomenon. For a guess of (E, δ), the commitment c can be computed as

The correctness of the guess (E, δ) can be verified from the relation wc2 (mod n). Bob can now compute the desired product

In order to strengthen the confidence about the correctness of T, Bob may repeat the protocol once more with the same values of ∊1, . . . , ∊t, but under normal conditions (that is, without faults). This time he obtains w′ ≡ (c′)2 (mod n) and r′ ≡ cT (mod n), which together give (r′)2wT2 (mod n), a relation that proves the correctness of T.

Bob repeats the above procedure t times in order to generate the system:

Equation 7.6


Here, ∊ki and Tk are known to Bob. Moreover, the exponents ∊ki can be so selected that the matrix (∊ki) is invertible modulo 2. In order to determine x1, Bob tries to find satisfying

for some integers v1, . . . , vt. Comparing the coefficients gives the linear system

which can be solved for u1, . . . , ut, since the matrix (∊ki) is invertible modulo 2. The solution gives v1, . . . , vt and hence

Similarly, x2, . . . , xt can be determined up to sign. Plugging in these values of xi in System (7.6) and solving another linear system modulo 2 gives the exact signs of all xi.

Notice that Bob could have selected ∊ki = δki (where δ is the Dirac delta). For this choice, System (7.6) immediately gives x1, . . . , xt. But, in practice, Alice may disagree to respond to such simplistic challenges. Moreover, Bob must not raise any suspicion about a possible malpractice. For a general choice, all Bob has to do additionally is a little amount of simple linear algebra. The parameter t is rather small (typically less than 20); so this extra effort is of little concern to Bob.

Countermeasures

Fault analysis could be a serious threat, especially to smart-card users and certification authorities. We mention here some precautions to guard against such attacks. Some of these work for a general kind of fault attack, the others are specific to the algorithms they plan to protect.

Exercise Set 7.2

7.1Consider the notations of Section 7.2.1. Assume that mi,j is constant for all i, j (and irrespective of d2j+1d2j), but the square times si,j and ti,j vary according to their operands. Device a timing attack on such a system.
7.2Show that under reasonable assumptions the SPA-resistant Algorithm 7.2 can be crypt-analyzed by timing attacks.
7.3Recall that SPA of Algorithm 7.1 may leak partial information on the private key (some 00 sequences in the key). Rewrite the algorithm to prevent this leakage.
7.4Assume that in Bao et al.’s attack on RSA described in the text, the attacker can induce faults in exactly two bit positions of d. Suggest how the two bits of d at these positions can be revealed from the resulting faulty signature.
7.5Consider a variant of the Bao et al.’s attack on RSA described in the text, in which the valid signature s on m is unknown to the attacker. Explain how the position j of the erroneous bit and the bit dj at this position can still be identified. [H]
7.6Bao et al. [17] propose an alternate fault analysis on RSA with square-and-multiply exponentiation. Use the notations (n, e, d, m, s, si) as in the text. Assume that the attacker knows an (m, s) pair and can induce a fault in exactly one of the values sj (and nowhere else) and generate the corresponding faulty signature. Suggest a strategy how the position j and the bit dj can be recovered in this case.
7.7Propose a fault attack on the ElGamal signature scheme (Algorithms 5.36 and 5.37), similar to the attack on DSA described in the text.

7.3. Backdoor Attacks

Backdoor attacks on a public-key cryptosystem refer to attacks embedded in the key generation procedure (hardware or software) by the designer of the procedure. A contaminated cryptosystem is one in which the key generation procedure comes with hidden backdoors. A good backdoor attack should meet the following criteria:

Young and Yung [307] have proposed using public-key cryptography itself for generating backdoors. In their schemes, the attacker (the designer) embeds the encryption routine and the encryption key of the attacker in the key generation procedure of the contaminated system. The decryption key of the attacker is not embedded in the contaminated system and is known only to the attacker. The attacker’s encryption system is assumed to be honest and unbreakable and, thereby, it gives the attacker the exclusive power to decrypt contaminated keys. Young and Yung call such a backdoor a secretly embedded trapdoor with universal protection (SETUP). They also coined the term kleptography to denote such use of cryptography against cryptography.

In the rest of this section, we denote the attacker’s encryption and decryption functions by fe and fd respectively. We often do not restrict these functions to public-key routines only. Since public-key routines are slow, symmetric-key routines can be employed in practice. Simple XOR-ing with a fixed bit string (known to the designer) may also suffice. However, for these faster alternatives of fe, fd, reverse engineering reveals the symmetric key or the XOR operand to the user who can subsequently mimic the attacker to steal keys generated elsewhere by the same contaminated system.

We use the following shorthand notations. Here, n stands for a positive integer that can be naturally identified with a unique bit string having the most significant (that is, leftmost) bit equal to 1.

|n|=the bit length of n.
lsbk(n)=the least significant k bits of n.
msbk(n)=the most significant k bits of n.
(a1a2 ‖ · · · ‖ ar)=the concatenation of the bit strings a1, a2, . . . , ar.

7.3.1. Attacks on RSA

RSA, (seemingly) being the most popular public-key cryptosystem, has been the target of most cryptanalytic attacks. Backdoor attacks are not an exception. The backdoor attacks on RSA work by cleverly hiding some secret information in the public key (n, e) of a user. As earlier, we denote the corresponding private exponent by d and the prime factors of n by p and q.

Hiding prime factor

The simplest attack is to choose a fixed p known to the designer. The other prime q is generated randomly, and correspondingly n = pq and the key pairs (e, d) are computed. Reverse engineering such a scheme is pretty simple, since two different moduli n1 = pq1 and n2 = pq2 belch out p = gcd(n1, n2) easily.

A better approach is given in Algorithm 7.3. The function fe may be RSA encryption under the designer’s public key. In that case, the RSA modulus of the attacker should be so chosen that the condition e < n is satisfied with good probability. On the other hand, if this modulus is too small, then this scheme will generate values of e much smaller than n.

In order to determine the secret exponent from a public key generated using this scheme, the attacker runs Algorithm 7.4. If fe and fd are RSA functions under the attacker’s keys, nobody other than the attacker can apply fd to generate p from e. This provides the designer with the exclusive capability of stealing keys.

A problem with Algorithm 7.3 is that the attacker has little control over the length of the public exponent e. If the user demands a small modulus (like e = 3 or e = 257), this scheme fails to produce one. Algorithm 7.5 overcomes this difficulty by hiding p in the high order bits of the modulus n (instead of in the exponent e). Young and Yung [307] proposed this algorithm in the name PAP (pretty awful privacy). The name contrasts with PGP (pretty good privacy), a popular and widely used RSA implementation.

Algorithm 7.3. A simple backdoor attack on RSA

Input:

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

Generate a random k-bit prime q.
while (1) {
    Generate a random k-bit prime p.
    n := pq.
    e := fe(p).
    if ((e < n) and (gcd(e, φ(n)) = 1)) {
        Compute d with ed ≡ 1 (mod φ(n)).
        Return (ned).
    }
}

Algorithm 7.4. Retrieving the secret exponent

Input: An RSA public key (n, e).

Output: The corresponding secret (p, q, d) or failure.

Steps:

p := fd(e).
if (p|n) {
    q := n/p.
    φ := (p – 1)(q – 1).
    d := e–1 (mod φ).
    Return (pqd).
else {
    /* The key is not generated by Algorithm 7.3 */
    Return failure.
}

Algorithm 7.5 works as follows. Following Young and Yung [307], we assume that the attacker uses RSA to realize fe and fd. The RSA modulus of the attacker is denoted by N. The attack requires |N| = k, where |p| = |q| = k. To start with, a random prime p of the desired bit length k is generated. This prime is to be encrypted using fe and so one requires p < N. Instead of encrypting p directly, the attacker uses a permutation function π keyed by K + i for some fixed K and for i = 1, 2, . . . , B, where B is a small bound (typically B = 16). This permutation helps the attacker in two ways. First, one may now have p > N, so a suspicion regarding bounded values of p does not arise. Second, it is cheaper to apply the permutation instead of generating fresh candidates for p. (In an (honest) RSA key generation routine, the prime generation part typically takes the most of the running time.)

Algorithm 7.5. Backdoor attack on RSA: Young and Yung’s PAP scheme

Input: .

Output: An RSA modulus n = pq with |p| = |q| = k, and exponents (e, d).

Steps:

while (1) {
    /* Try to generate a suitable p */
    Generate a random k-bit prime p.
    i = 1.
    while (i ≤ B) {
        p′ := πK+i(p).    /* Use a keyed permutation πK+i*/
        if (p′ < N) { break } else { i++ }
    }

    /* Try to generate n and q */
    if (i ≤ B) {
        p″ := fe(p′).  /* Encrypt p′ by the designere’s public key */
        j := 1.
        while (j ≤ B) {
            .   /*  is a keyed permutation and |p‴| = k or k – 1. */
            Generate a pseudorandom bit string a of length k.
            X := (p‴ ‖ a).
            q := X quot p.
            if (|q| = k) and (q is prime) {
                n := pq.
                e := 17.
                while (gcd(e, φ(n)) ≠ 1) { e + = 2. }
                d := e–1 (mod φ(n)).
                Return (ned).
            } else { j ++ }
        }
    }
}

Once a suitable p and the corresponding p′ = πK+i(p) are generated, the encryption function fe is applied to generate p″ = fe(p′). Now, instead of embedding p″ directly in the modulus n, another keyed permutation is applied on p″ to generate . This permutation facilitates investigating several choices for q and so is a faster alternative than restarting the entire process afresh, every time an unsuitable q is computed. A pseudorandom bit string a of length k is appended to p‴ to obtain an approximation X for n. If q := ⌊X/p⌋ happens to be a prime of bit length k, the exact n = pq is computed, else another j is tried. If all values of (for some small bound B′) fail, the entire procedure is repeated with a new k-bit prime p.

For random choices of a, the quotients q = ⌊X/p⌋ behave like random integers and so the probability that q is prime is almost the same as random integers of bit length k. Write X = qp + r with r = X rem p. If r > a, then n = Xr has p‴ – 1 embedded in its higher bits, whereas if ra, then p‴ itself is embedded in the higher bits of n.

Once suitable p and q are found, the PAP routine generates (like PGP) a small encryption exponent e relatively prime to φ(n) and its inverse d modulo φ(n). One can anyway opt for bigger values of e. In that case, instead of choosing e successively from the sequence 17, 19, 21, 23, . . . one writes one’s customized steps for generating candidate values for e. Choosing small e in Algorithm 7.5 indicates resemblance with PGP and the flexibility of doing so.

The authors of PAP compare their implementation of Algorithm 7.5 with that of the honest PGP key generation procedure. The contaminated routine has been found to run on an average only 20 per cent slower than the honest routine.

Algorithm 7.6 recovers the prime factor p of n from a public key (n, e) generated by PAP, using the RSA decryption function fd of the attacker. Reverse engineering may make available to the user the permutation functions π and π′, the fixed constants K, B, B′ and the designer’s public key. But this knowledge alone does not empower the user to steal PAP-generated keys.

Algorithm 7.6. Retrieving the prime divisor

Input: An RSA public key (n, e) with n = pq.

Output: The prime divisor p of n or failure.

Steps:

Write n = (U ‖ Vwith |V | = k.
for 
    for j = 1, 2, . . . , B′ {
        
        p′ := fd(p″).
        for i = 1, 2, . . . , B {
            p := (πK+i)–1(p′).
            if (p|n) { Return p. }
        }
    }
}
/* (neis not generated by Algorithm 7.5 */
Return failure.

Hiding small private exponent

Another possible backdoor is hiding an RSA key pair (∊, δ) with small δ inside a key pair (e, d). Crépeau and Slakmon [70] realize this backdoor using a result from Boneh and Durfee [32], which describes a polynomial-time (in |n|) algorithm for computing δ from the public key (n, ∊), provided that δ is less than n0.292. This attack is explained in Algorithm 7.7. Here, the modulus is a genuine random RSA modulus. The mischievous key ∊ is neatly hidden by the attacker’s encryption routine fe. The resulting output key pair (e, d) looks reasonably random. However, this scheme has a drawback similar to Algorithm 7.3; that is, it cannot easily generate small values of e.

Algorithm 7.7. Backdoor attack on RSA: small private exponent

Input: .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Generate random primes pq of bit length ~ k/2, such that n := pq has |n| = k.
do {
   Generate random  with gcd(δ, φ(n)) = 1 and |δ| < 0.292|n|.
   ∊ := δ–1 (mod φ(n)).
   e := fe(∊).    /* Hide ∊ */
} while (gcd(e, φ(n)) ≠ 1).
d := e–1 (mod φ(n)).
Return (ned).

Algorithm 7.8 retrieves d from a public key (n, e) generated by Algorithm 7.7.

Algorithm 7.8. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.7.

Output: The corresponding private key d.

Steps:

∊ := fd(e).     /* Recover the hidden exponent */
Use Boneh and Durfee’s algorithm to recover δ ≡ ∊–1 (mod φ(n)).
Use ∊ and δ to compute φ(n).
Compute d ≡ e–1 (mod φ(n)).

The correctness of Algorithm 7.8 is evident. In order to see how the knowledge of ∊ and δ reveals φ(n), note that x := ∊δ – 1 is a multiple of φ(n); that is,

Equation 7.7


for some integer l. Since δ < n0.292 and ∊ < n, we have x < n1.292. But φ(n) ≈ n and so l cannot be much larger than n0.292. Since |p| ≈ k/2 ≈ |q|, we have l(p+q–1) < n. Now, if we write

x = an + b = (a + 1)n – (nb)

with a = x quot n and b = x rem n, comparison with Equation (7.7) reveals that l = a + 1. This gives φ(n) = x/l.

Although not needed explicitly here, the factorization of n can be easily obtained by solving the equations pq = n and p + q = n – φ(n) + 1. If ∊ and δ are not small, we may have l(p + q – 1) ≥ n, and φ(n) cannot be calculated so easily as above. A randomized polynomial-time algorithm can still factor n from the knowledge of ∊, δ and n. For the details, solve Exercise 7.9.

Hiding small public exponent

Crépeau and Slakmon propose another backdoor attack based on the following result due to Boneh et al. [33]. Let (∊, δ) be a key pair for an RSA modulus n = pq. Further, let and 2t–1 ≤ ∊ < 2t. There exists a polynomial-time algorithm that, given n, ∊, and t most significant and |n|/4 least significant bits of δ, recovers the full private exponent δ.

Algorithm 7.9. Backdoor attack on RSA: small public exponent

Input: and .

Output: An RSA modulus n = pq with |n| = k and a key pair (e, d).

Steps:

Generate random primes pq of bit length ~ k/2, such that n := pq has |n| = k.
do {
   Generate random  with gcd(∊, φ(n)) = 1 and |∊| = t.
   δ := ∊–1 (mod φ(n)).
   .
}
while (gcd(e, φ(n)) ≠ 1).
d := e–1 (mod φ(n)).
Return (ned).

Algorithm 7.9 uses fe to hide in e a small ∊, t most significant bits of δ and |n|/4 least significant bits of δ. A string of bit length 2t + k/4 is encrypted by fe. Applying the decryption routine fd on e recovers these hidden values, from which ∊ and δ and hence φ(n) can be obtained. Algorithm 7.10 does this task. This scheme also fails, in general, to produce small public exponents e.

Algorithm 7.10. Retrieving the secret exponent

Input: An RSA public key (n, e) generated by Algorithm 7.9 and the matching .

Output: The corresponding private key d.

Steps:

Compute fd(eand retrieve the following:
   (a) the hidden public exponent ∊,
   (b) the t most significant bits of the hidden private exponent δ and
   (c) the |n|/4 least significant bits of δ.
Apply the Boneh-Durfee-Frankel algorithm to recover δ completely.
Use ∊ and δ to compute φ(n).       /* See Exercise 7.9 */
Compute d ≡ e–1 (mod φ(n)).

7.3.2. An Attack on ElGamal Signatures

We now describe a backdoor attack on the ElGamal signature Algorithm 5.36. This attack does not work when the user’s permanent key pair is generated. It manipulates the session-key generation in such a way that the user’s permanent private key is revealed to the attacker from two successive signatures.

Let p be a prime, g a generator of , and (d, gd(mod p)) the permanent key pair of Alice. The attacker uses the same field and a key pair (D, gD (mod p)) with gD supplied to the signing device. Suppose that Alice signs two messages m1 and m2 to generate signatures (s1, t1) and (s2, t2) respectively, where

The attack proceeds by letting d1 arbitrary, but by taking

d2 ≡ (gD)d1 (mod p).

Since , we have

that is,

The private key D of the attacker (or d1) is required for computing d; so nobody other than the designer can retrieve Alice’s secret by observing the contaminated signatures (s1, t1) and (s2, t2).

7.3.3. An Attack on ElGamal Encryption

For ElGamal encryption (Algorithm 5.15) and for Diffie–Hellman key exchange (Algorithm 5.27) over , a party (Alice) generates random session key pairs of the form (d′, gd(mod p)) and communicates the public session key gd to another party. The following backdoor manipulates the session-key generation in such a way that two public session keys reveal the second private session key (but not the permanent private key). We assume that the attacker learns the public session keys by eavesdropping. The attacker’s key-pair is (D, gD(mod p)). The contaminated routine contains the public key gD(mod p), but not the private key D.

Let (d1, r1) and (d2, r2) be two session keys used by Alice, where

r1gd1 (mod p),
r2gd2 (mod p).

The contaminated routine that generates the session keys uses a fixed odd integer u, a hash function H and a random bit to generate d2 from d1 as follows:

zgd1+ub(gD)d1 (mod p),
d2H(z) (mod p – 1).

The attacker knows r1 and r2 by eavesdropping. She computes d2 by Algorithm 7.11, the correctness of which is established from that .

Algorithm 7.11. Backdoor attack on ElGamal encryption

.                                                                     /* corresponding to b = 0 */
if (r2 ≡ gH(z0) (mod p)) { Return H(z0). }
z1 := z0gu (mod p).                                                                   /* corresponding to b = 1 */
if (r2 ≡ gH(z1) (mod p)) { Return H(z1). }
Return failure.              /* The attackeres routine was not used for key generation. */

Algorithm 7.11 requires the attacker’s private key D (or d1) and can be performed only by the attacker. Now, d2 can be analogously used to generate the third session key d3 and so on, that is, the attacker can steal all the private session keys (except the first).

The odd integer u is used for additional safety. In order to see what might happen without it (that is, with b = 0 always), assume that H can be inverted. This gives z and (mod p). If D is even, y is always a quadratic residue modulo p. If D is odd, y is a quadratic residue or non-residue modulo p depending on whether d1 is even or odd. The randomly added odd bias u destroys this correlation of z with quadratic residues.

7.3.4. Countermeasures

Using trustworthy implementations (hardware or software) of cryptographic routines (in particular, key generation routines) eliminates or reduces the risk of backdoor attacks. Preferences should be given to software applications with source codes (rather than to the more capable ones without source codes). Random number generators should be given specific attention. Cascading products from different independent sources also minimizes the possibility of hidden backdoors.

If the desired grain of trust is missing from the available products, the only safe alternative is to write the codes oneself. Complete trust on cryptographic devices and packages and using them as black boxes without bothering about the internals is often called black-box cryptography. Users should learn to question black-box cryptography. The motto is: Be aware or bring peril.

Exercise Set 7.3

7.8Argue that reverse engineering the PAP routine (Algorithm 7.5) can enable a user to distinguish in polynomial time between key pairs generated by PAP and those generated by honest procedures.
7.9Let n = pq be an RSA modulus and (e, d) a key pair under this modulus. Write ed – 1 = 2st, where s = v2(ed – 1) (so that t is odd). Since ed – 1 is a multiple of φ(n) = (p – 1)(q – 1) with odd p, q, we have s ≥ 2.
  1. Show that for any the multiplicative order ordn(at) divides 2s. [H]

  2. Let be such that at has different orders modulo p and modulo q. Show that gcd(a2σt – 1, n) is a non-trivial divisor of n for some .

  3. Let g be a generator of . Take a := gk (mod p) for some and let ordp(at) = 2σ. Show that σ = v2(p – 1) if k is odd, and σ < v2(p – 1) if k is even. [H] An analogous result holds for the other prime q.

  4. Demonstrate that there are at least φ(n)/2 elements a in with the property that at has different orders modulo p and q. [H]

  5. Suggest a randomized poly-time algorithm for factoring n from the knowledge of n, e and d.

Chapter Summary

In this chapter, we discuss some indirect ways of attacking public-key cryptosystems. These attacks do not attempt to solve the underlying intractable problems, but watch the decryption device and/or use malicious key generation routines in order to gain information about private keys.

The timing attack works based on the availability of the total times of several private-key operations under the same private key. It successively keeps on guessing bits of the private key by performing some variance calculations.

The power attack requires the availability of the power consumption patterns (also called power traces) of the decrypting (or signing) device during one or more private-key operations. If the measurements are done with good accuracy and resolution, a single power trace may reveal the private key to the attacker; this is called simple power analysis. In practice, however, such power measurements are often contaminated with noise. Differential power analysis requires power traces from several decryption operations under the same private key. The different traces are combined using a technique that reduces the effect of noise.

A fault attack can be mounted by injecting one or more faults in the device performing private-key operations. Fault attacks are discussed in connection with several encryption (RSA), signature (ElGamal, DSA and so on) and authentication (FFS) schemes.

The above three kinds of attacks are collectively called side-channel attacks. Several general and algorithm-specific countermeasures against side-channel attacks are discussed.

Backdoor attacks, on the other hand, are mounted by malicious key generation routines. Young and Yung propose the concept of secretly embedded trapdoor with universal protection (SETUP). In a SETUP-contaminated system, the designer of the key generation routine possesses the exclusive right to steal keys from users. Several examples of backdoor attacks on RSA and ElGamal cryptosystems are described.

Suggestions for Further Reading

Kocher introduces the concept of side-channel attacks in his seminal paper [155]. This paper describes further details about the timing attack (like a derivation of the choice of the sample size k) and some experimental results.

Timing attacks in various forms are applicable to other systems. Kocher [155] himself suggests a chosen message attack on an RSA implementation based on CRT (Algorithm 5.4). Carol, in an attempt to guess Alice’s public key d, tries to guess the factor p (or q) of the modulus n using a timing attack. She starts by letting Alice sign a message y (c in Algorithm 5.4) close to an initial guess of p. The CRT-based algorithm first reduces y modulo p and modulo q before performing the modular exponentiations. If y < p already, then the initial reduction modulo p returns (almost) immediately, whereas if yp, the reduction involves at least one subtraction. This gives a variation in the timings based on the value of p. This fact is exploited by the attack to arrive at better and better approximations of p.

A known-message timing attack (in addition to the chosen message attack mentioned in the last paragraph) on the CRT-based RSA signature scheme is proposed by Kocher in the same paper [155]. Kocher also explains a timing attack on the signature algorithm DSA (Algorithm 5.43), based on the dependence of the modular reduction of H(M) + ds modulo r on the bits of the signer’s private key d.

Large scale implementations of timing attacks are reported in the technical reports [77, 259] from the Crypto group of Université catholique de Louvain. These implementations study Montgomery exponentiation.

Kocher [155] mentions the possibility of power attacks. However, a concrete description is first published in Kocher et al. [156], which explains both SPA and DPA. DES is the basic target of this paper, though possibilities for using these techniques against public-key systems are also mentioned.

Several variants of the basic DPA model described in the text have been proposed. Messerges et al. [200] describe attacks against smart-card implementations of exponentiation-based public-key systems. Also consult Aigner and Oswald’s tutorial [9] for a recent survey.

DPA seems to be the most threatening of all side-channel attacks. Many papers suggesting countermeasures against DPA have appeared. Chari et al. [45] propose a masking method. Messerges [199] applies this idea to a form suitable for AES.[4] Messerges’ countermeasure is broken in [63] using a multi-bit DPA. Some other useful papers on DPA include [10, 55, 201].

[4] AES is an abbreviation for advanced encryption standard which is a US-government standard that supersedes the older standard DES. AES uses the Rijndael cipher [219].

Boneh et al. [30, 31] from the Bellcore Lab. announce the first systematic study of fault attacks on asymmetric-key cryptosystems. They explain fault attacks on RSA (with and without CRT), the Rabin signature scheme, the Feige–Fiat–Shamir identification protocol and on the Schnorr identification protocol. These attacks are collectively known as Bellcore attacks.

Arjen K. Lenstra points out that the fault attack on CRT-based RSA does not require the valid signature. Joye and Quisquater propose some generalizations of the Bellcore–Lenstra attack. A form of this attack is applicable to elliptic-curve cryptosystems. The paper [142] talks about these developments.

Bao et al. [17] propose fault attacks on DSA, ElGamal and Schnorr signatures. They also describe variants of the fault analysis of RSA based on square-and-multiply algorithms. Zheng and Matsumoto [315] indicate the possibilities of attacking the random bit generator in a smart card.

Biham and Shamir [22] investigate fault analysis of symmetric-key ciphers and introduce the concept of differential fault analysis. Anderson and Kuhn [11] also study fault analysis of symmetric-key ciphers. Aumüller et al. [15] publish their practical experiences regarding physical realizations of faults in smart cards. They also suggest countermeasures against such attacks.

James A. Muir’s work [215] is a very readable and extensive survey on side-channel cryptanalysis. Also look at Boneh’s survey [29].

Because of small key sizes, elliptic-curve cryptosystems are very attractive for implementation in smart cards. It is, therefore, necessary to provide effective countermeasures against side-channel attacks (most importantly, against the DPA) for elliptic-curve cryptosystems. Many recent articles discuss this issue. Coron [62] suggests the use of random projective coordinates to avoid the costly (and power-consuming) field inversion operation needed for adding and doubling of points. Möller [206] proposes a non-conventional way of carrying out the double-and-add procedure. Izu and Takagi [138] describe a Montgomery-type point addition scheme resistant against side-channel attacks. An improved version of this algorithm, that works for a more general class of elliptic curves, is presented in Izu et al. [137].

Young and Yung introduce the concept of SETUP in [307]. The PAP SETUP on RSA and the ElGamal signature SETUP are from this paper which also includes attacks on DSA and Kerberos authentication protocol. In a later paper [308], Young and Yung categorizes SETUP in three types: regular, weak and strong. Strong SETUPs are proposed for Diffie–Hellman key exchange and for RSA. The third reference [309] from the same authors extends the ideas of kleptography further and provides backdoor routines for several other cryptographic schemes.

Crépeau and Slakmon [70] adopt a more informal approach and discuss several backdoors for RSA key generation. In addition to the trapdoors with hidden small private and public exponents, described in the text, they propose a trapdoor that hides small prime public exponent. They also present an improved version of the PAP routine. Unlike Young and Yung, they suggest symmetric techniques for designing fe, fd. Symmetric techniques endanger universal protection of the attacker, but continue to make perfect sense in the context of black-box cryptography.

8. Quantum Computation and Cryptography

8.1Introduction
8.2Quantum Computation
8.3Quantum Cryptography
8.4Quantum Cryptanalysis
 Chapter Summary
 Sugestions for Further Reading

Our best theories are not only truer than common sense, they make far more sense than common sense does.

—David Deutsch [76]

One can be a masterful practitioner of computer science without having the foggiest notion of what a transistor is, not to mention how it works.

—N. David Mermin [197]

But suppose I could buy a truly powerful quantum computer off the shelf today — what would I do with it? I don’t know, but it appears that I will have plenty of time to think about it!

—John Preskill [243]

8.1. Introduction

So far, we studied algorithms in the area of cryptology, that can be implemented on classical computers (Turing machines or von Neumann’s stored-program computers). Now, we shift our attention to a different paradigm of computation, known as quantum computation. The working of a quantum computer is specified by the laws of quantum mechanics, a branch of physics developed in the 20th century. However counterintuitive, contrived or artificial these laws initially sound, they have been accepted by the physics community as robust models of certain natural phenomena. A bit, modelled as a quantum mechanical system, appears to be a more powerful unit than a classical bit to build a computing device.

This enhanced power of a computing device has many important ramifications in cryptology. On one hand, we have polynomial-time quantum algorithms to solve the integer factorization and the discrete-log problems. This implies that most of the cryptographic algorithms that we discussed earlier become (provably) insecure. On the other hand, there are proposals for a quantum key-exchange method that possesses unconditional (and provable) security.

Unfortunately, it is not clear how one can manufacture a quantum computer. Technological difficulties involved in the process appear enormous and a section of the crowd even questions the feasibility of building such a machine. However, no laws or proofs rule out the possibility of success in the (near or distant) future. Myth has it that Thomas Alva Edison, after several hundred futile attempts to manufacture an electric light bulb, asserted that he knew hundreds of ways how one cannot make an electric bulb. Edison succeeded eventually and dream turned into reality.

But we will not build quantum computers in this chapter. That is well beyond the scope of this book, or, for that matter, of computer science in general. It is thoroughly unimportant to understand the I-V curves of a transistor (or even to know what a transistor actually is), when one designs and analyses (classical) algorithms. In order to design and analyse quantum algorithms, it is equally unimportant to know how a quantum computer can be realized.

8.2. Quantum Computation

We start with a formal description of quantum computation. Quantum mechanical laws govern this paradigm. We will pay little attention to the physical interpretations of these laws. A mathematical formulation suffices for our purpose.

For defining a quantum mechanical system, we need to enrich our mathematical vocabulary. Let V be a vector space over (or ). Using Dirac’s ket notation we denote a vector ψ in V as |ψ〉.

Definition 8.1.

An inner product (also called a dot product or a scalar product) on V is a function satisfying the following properties:

  1. Positivity For any , the inner product 〈ψ|ψ〉 is real and non-negative. Moreover, 〈ψ|ψ〉 = 0 if and only if |ψ〉 = 0.

  2. Linearity For a1, and |ψ〉, , we have .

  3. Skew symmetry For any |ψ〉, , we have , where the bar denotes complex conjugate.

A vector space V with an inner product is called an inner product space.

Example 8.1.

For , the space is an inner product space with the inner product of |ψ〉 = (ψ1, . . . , ψn) and defined as

Definition 8.2.

The inner product on a vector space V induces a norm (Definition 2.115) on V:

An inner product space which is complete (Definition 2.119) under the norm induced by its inner product is called a Hilbert space. We will typically consider finite-dimensional Hilbert spaces (over ) and for denote the n-dimensional Hilbert space by .

Definition 8.3.

We define an equivalence relation ~ on a Hilbert space as if and only if for some . An equivalence class under this relation is called a ray in . One typically considers a vector |ψ〉 with 〈ψ|ψ〉 = 1 as a representative of its equivalence class. Such a representative is unique up to multiplication by complex numbers of the form eiθ.

Definition 8.4.

An orthonormal basis of a Hilbert space is a subset B of with the following properties:

  1. B is a -basis of .

  2. 〈ψ|ψ〉 = 1 for every .

  3. for every pair of distinct vectors ψ, .

It is customary to denote the n vectors in an orthonormal basis of by the symbols |0〉, |1〉, . . . , |n – 1〉.

Example 8.2.

|0〉 := (1, 0, 0 . . . , 0), |1〉 := (0, 1, 0, . . . , 0), . . . , |n – 1〉 := (0, 0, . . . , 0, 1) form an orthonormal basis of under the inner product of Example (8.1).

8.2.1. System

The following axiom describes the model of a quantum mechanical system.

Axiom 8.1. First axiom of quantum mechanics

A system is a ray in a (finite-dimensional) Hilbert space (over ).

Definition 8.5.

The simplest non-trivial quantum mechanical system is a ray in a 2-dimensional Hilbert space . Such a system is assumed to be the basic building block of a quantum computer and is called a quantum bit or a qubit.

In order distinguish a qubit from a classical bit, we call the latter a cbit.

has an orthonormal basis {|0〉, |1〉}. In the classical interpretation, a cbit can assume only the two values |0〉 and |1〉, whereas a qubit can assume any value of the form

a|0〉 + b|1〉witha, , |a|2 + |b|2 = 1.

Such a state of the qubit is called a superposition of the classical states.

Though we don’t care much, at least for the moment, here are two promising candidates for realizing a qubit:

A conceptual example of a 2-state quantum system is the Schrödinger cat. The two independent states of a cat, as we classically know, are |alivei〉 and |deadi〉. However, if we think of the cat confined in a closed room and isolated from our observations, quantum mechanics models the state of the cat as a superposition (that is, a complex-linear combination) of these two states. But then if the quantum model were true, opening the room may reveal the cat in a non-trivial state a|alive〉 + b|dead〉 for some complex numbers a, b with |a|2 + |b|2 = 1. It would indeed be an exciting experience. But alas, quantum mechanics precludes the possibility of such an observation. Read on to know what we would actually see, if we open the room.

8.2.2. Entanglement

A single qubit is too small to build a useful computer. We need to use several (albeit a finite number of) qubits and hence must have a way to describe the combined system in terms of the individual qubits. As the simplest and basis case, we first concentrate on combining two quantum systems into one.

Axiom 8.2. Second axiom of quantum mechanics

Let A and B be two quantum mechanical systems with respective Hilbert spaces and . Let {|iA | i = 0, . . . , m – 1} and {|jB | j = 0, . . . , n – 1} be orthonormal bases of these Hilbert spaces. The quantum mechanical system AB having A and B as its two parts is described by the tensor product

where is an mn-dimensional Hilbert space with an orthonormal basis

{|iA ⊗ |jB | i = 0, . . . ,m – 1 and j = 0, . . . , n – 1}.

It is customary to abbreviate the normalized vector |iA ⊗ |jB as |iA|jB or even as |ijAB. A general state of AB is of the form

We can generalize this construction to describe a system having components A1, . . . , Ak. If is the Hilbert space of Ai with an orthonormal basis {|ji | 0 ≤ j < ni}, the composite system A1 · · · Ak has the n1 · · · nk-dimensional Hilbert space with an orthonormal basis comprising the vectors

|j11 ⊗ |j22 ⊗ · · · ⊗ |jkk = |j11|j22 · · · |jkk = |j1j2 . . . jk

with 0 ≤ ji < ni for all i = 1, . . . , k.

Definition 8.6.

An n-bit quantum register is a system having exactly n qubits.

Let A1, . . . , An denote the individual bits in an n-bit quantum register A. Each Ai has the Hilbert space with orthonormal basis {|0〉, |1〉}. So A has the 2n-dimensional Hilbert space with an orthonormal basis consisting of the vectors

|j1〉 ⊗ |j2〉 ⊗ · · · ⊗ |jn〉 = |j1〉|j2〉 · · · |jn〉 = |j1j2 · · · jn

with each . Viewed as an integer in binary notation, j1j2 . . . jn is an integral value between 0 and 2n – 1. This gives us a canonical numbering |0〉, |1〉, . . . , |2n – 1〉 of the basis vectors for the register A. These 2n values are precisely the states that a classical n-bit register can have. The quantum register can, however, be in any state |ψ〉 which is a superposition of the classical states:

Let us once again look at the general composite system A = A1 · · · Ak. In the classical sense, each state of A is composed of the individual states of the subsystems Ai. For example, each of the 2n classical states of an n-bit register corresponds to a choice between |0〉 and |1〉 for each individual bit. That is, each individual component retains its own state in a classical composite system. This is, however, not the case with a quantum composite system. Just think of a 2-bit quantum register C := AB. A state

|ψ〉C = c0|0〉C + c1|1〉C + c2|2〉C + c3|3〉C

of C equals a tensor product

1A ⊗ |ψ2B=(a0|0〉A + a1|1〉A) ⊗ (b0|0〉B + b1|1〉B)
 =a0b0|0〉C + a0b1|1〉C + a1b0|2〉C + a1b1|3〉C,

if and only if c0c3 = c1c2.

Definition 8.7.

The state |ψ〉 of a quantum register A = A1 · · · An is called entangled, if |ψ〉 cannot be written as a tensor product of the states of any two parts of A. In other others, |ψ〉 is entangled if and only if no set of fewer than n qubits of A possesses its individual state.

Entanglement essentially implies correlation or interaction between the components. In a composite quantum system, we cannot treat the components individually. A quantum system, as we have defined (axiomatically) earlier, is a completely isolated system. In reality, interactions with the surroundings make a (non-isolated) system change its state and get entangled. This is one of the biggest problems in the realization of a quantum computer. Quantum error correction is an important topic in quantum computation. For our purpose, we stick to the abstract model of an isolated system (quantum register) immune from external disturbances.

8.2.3. Evolution

Quantum registers give us a way to store quantum information. A computation involves manipulating the information stored in the registers. In quantum mechanics, all such operations must be reversible, that is, it must be possible to invert every operation. The only invertible operations on the classical states |0〉, |1〉, . . . , |2n – 1〉 of an n-bit quantum register A are precisely all the permutations of the classical states. Now that A can be in many more (quantum) states, there are other allowed operations on A. Any such operation must be reversible and of a particular type. This is the third axiom of quantum mechanics, which is detailed shortly.

A classical n-bit register supports many non-invertible operations. For example, erasing the content of the register (that is, resetting all the bits to zero) is a non-invertible process, since the pre-erasure state of the register cannot be uniquely determined after the erase operation is carried out. Classical computation is based on (classical) gates (like NOT, AND, OR, XOR, NOR, NAND), most of which are non-invertible. XOR, as an example, requires two input bits and outputs a single bit. It is impossible to determine the inputs uniquely from the output only. All such non-reversible operations are disallowed in the quantum world. An invertible version of the XOR operation takes two bits x and y as input and outputs the two bits x and xy (where ⊕ denotes XOR of bits). Given the output (x, xy), the input can be uniquely determined as (x, y) = (x, x ⊕ (xy)), that is, by applying the reversible XOR operation once more.

Like XOR, all bit operations that build up a classical computer can be realized using reversible operations only. This gives us the (informal) assurance that quantum computers are at least as powerful as classical computers.

Back to the business—the third axiom of quantum mechanics.

Definition 8.8.

Let U be a square matrix (that is, an m × m matrix for some ) with complex entries. The conjugate transpose of U is denoted by the symbol U, that is, if U = (uij), then . U is called unitary, if UU = UU = I, where I is the m × m identity matrix. Every unitary matrix U is invertible with U–1 = U, and preserves the inner product of , that is, for |ψ〉, .

Let A be a quantum system (like a quantum register) with Hilbert space . An m × m unitary matrix U defines a unitary linear transformation on taking a normalized vector |ψ〉 to a normalized vector U|ψ〉. Moreover, the transformation maps an orthonormal basis of to another orthonormal basis of (Exercise 8.4).

Axiom 8.3. Third axiom of quantum mechanics

A quantum system evolves unitarily, that is, any operation on a quantum mechanical system is a unitary transformation.

Example 8.3.

The Hadamard transform H on one qubit is defined as:

(Recall that a linear transformation is completely specified by its images of the elements of a basis.) If one takes and , the Hadamard transform corresponds to the unitary matrix

By linearity, H transforms a general state |ψ〉 = a|0〉 + b|1〉 to the state

Some other unitary operators are described in Exercises 8.5 and 8.6.

An important consequence of quantum mechanical dynamics is that cloning of a state of a system is not permissible. In other words, there does not exist an operator that copies an arbitrary state (content) of one quantum register to another.

Theorem 8.1. No-cloning theorem

For two n-bit registers A and B, there do not exist a unitary transform U of the composite system AB and a state of B, such that for every state of |ψ〉 of A.

Proof

Assume that such a state of B and a unitary transform U of AB exist. Take two states |ψ1〉 and |ψ2〉 of A. Then, and . By linearity, we have . Now, since U clones |aψ1 + bψ2〉 also, . The two expressions for are different, unless a = 0, b = 1 or a = 1, b = 0.

8.2.4. Measurement

We have seen how to represent a quantum mechanical system and do operations on the system. Now comes the final part of the game, namely observing or measuring or reading the state of a quantum system. In classical computation, reading the value stored in a classical register is a trivial exercise—just read it! In quantum mechanics, this is not the case.

Axiom 8.4. Fourth axiom of quantum mechanics—the Born rule

Let A be a quantum mechanical system with an orthonormal basis {|0〉, |1〉, . . . , |m – 1〉}. Assume that A is in a state . A measurement of A at this state is a mechanism (or device) that outputs one of the integers , and i is output with probability |ai|2. If i is output by the measurement, the system collapses from the state |ψ〉 to the state |i〉 after the measurement.

This means that whatever the state |ψ〉 of A was before the measurement, the process of measurement can reveal only one of m possible integer values. Moreover, the measurement causes a total loss of information about the pre-measurement amplitudes ai. Thus, it is impossible to measure A repeatedly at the state |ψ〉 to see a statistical pattern in the occurrences of different values of i so as to guess the probabilities |ai|2.

If we open the room, we can see the Schrödinger cat in only one of the two possible states: |alivei or |deadi. Well, then, what else can we expect? Quantum mechanics only models the cat in the isolated room as one evolving following the unitary dynamics.

At first glance, this is rather frustrating. We claim that the system went through a series of classically meaningless states, but the classical states are all we can see. What is the guarantee that the system really evolved in the quantum mechanical way? Well, there is no guarantee actually. The solace is that the axioms of quantum mechanics can explain certain natural phenomena. Also it is perfectly consistent with the classical behaviour in that if the system A evolves classically and is measured at the state |i〉 (so that ai = 1 and aj = 0 for ji), measuring A reveals i with probability one and causes the system to collapse to the state |i〉, that is, to remain in the state |i〉 itself.

There is a positive side of the quantum mechanical axioms. A quantum mechanical system is inherently parallel. An n-bit classical register at any point of time can hold only one of the classical values |0〉, . . . , |2n – 1〉. An n-bit quantum register, on the other hand, can simultaneously hold all these classical values, with respective probabilities. This inherent parallelism seems to impart a good deal of power to a computing device. Of course, as long as we cannot harness some physical objects to build a real quantum mechanical computing device, quantum computation continues to remain science fiction. But on an algorithmic level, the inherent parallelism of a (hypothetical) quantum computer can be exploited to do miracles, for example, to design a polynomial-time integer factorization algorithm. This is where we win—at least conceptually. Our failure to see a cat in the state (|alive〉 – |dead〉) should not bother us at all!

Measurement of a quantum register gives us a way to initialize a quantum register A to a state |ψ〉. Suppose that we get the value i upon measuring A. We then apply any unitary transform on A that changes A from the post-measurement state |i〉 to the desired state |ψ〉.

The measurement described in Axiom 8.4 is called measurement in the classical basis. The system A has, in general, many orthonormal bases other than the classical one {|0〉, . . . , |m – 1〉}. If B is any such basis, we can conceive of measuring A in the basis B. All we need to perform is to rewrite the state of A in terms of the new basis B. This can be achieved by applying to A a unitary transformation (the change-of-basis transformation) before the measurement in the classical basis is carried out.

A generalization of the Born rule is also worth mentioning here. Suppose that we have an m + n-bit quantum register A and we want to measure not all but some of the bits of A. To be more specific, let us say that we want to measure the leftmost m bits of A, though the generalized Born rule works for any arbitrary choice of m bit positions in the register A. Denoting by |im, i = 0, . . . , 2m – 1, the canonical basis vectors for the left m bits and by |jn, j = 0, . . . , 2n – 1, those for the right n bits, a general state of A can be written as

with Σi,j|ai,j|2 = 1 and with |i, jm+n identified as |im|jn = |im ⊗ |jn. A measurement of the left m bits of A yields an integer i, 0 ≤ i ≤ 2m – 1, with probability . Also this measurement causes A to collapse to the state .

Now, if we immediately apply the generalized Born rule once again on the right n bits of A, we get an integer j, 0 ≤ j ≤ 2n – 1, with probability |ai,j|2/pi and the system collapses to the state |in|jn. The probability of getting |in|jn by this two step process is then pi|ai,j|2/pi = |ai,j|2. This is consistent with a single application of the original Born rule.

8.2.5. The Deutsch Algorithm

We start with a general framework of doing computations using quantum registers. Suppose we want to compute a function f which requires an m-bit integer as input and which outputs an n-bit integer. A general function f need not be invertible, but we cannot afford non-invertible operations on quantum registers. This is why we work on an m + n-bit quantum register A in which the left m bits represent the input and the right n bits the output. Computing f(x) for a given x is tantamount to designing a unitary transformation Uf that acts on A and converts its state from |xm|yn to |xm|f(x) ⊕ yn, where ⊕ is the bitwise XOR operation, and where the subscripts (m and n) indicate the number of bits in the input or output part of A. It is easy to verify that Uf is unitary. Moreover, the inverse of Uf is Uf itself. For y = 0, we, in particular, have Uf (|xm|0〉n) = |xm|f(x)〉n.

It may still be unclear to the reader what one really gains by using this quantum model. The answer lies in the parallelism inherent in a quantum register. In order to see how this parallelism can be exploited, we describe David Deutsch’s algorithm which, being the first known quantum algorithm, has enough historical importance to be included here in spite of its apparent irrelevance in the context of cryptology.

Assume that f : {0, 1} → {0, 1} is a function that operates on one bit and outputs one bit. There are four such functions: Two of these are constant functions (f(0) = f(1)) and the remaining two non-constant (f(0) ≠ f(1)). We are given a black box Df representing f. We don’t know which one of the four functions Df actually implements, but we can supply a bit to Df as input and read its output on this bit. Our task is to determine whether Df represents a constant function or not. Classically, we make two invocations of Df on the inputs 0 and 1 and make a comparison of the output values f(0) and f(1). It is impossible to solve the problem classically using only one invocation of the black box. The Deutsch algorithm makes this task possible using quantum computational techniques.

Following the general quantum computational model we assume that Df is a unitary transformation on a 2-bit register A (with m = n = 1) that computes Df |x〉|y〉 = |x〉|f(x) ⊕ y〉 with the left (resp. the right) bit corresponding to the input (resp. the output) of f. Instead of supplying a classical input to Df we initialize the register A to the state

Linearity shows that on this input, Df ends its execution leaving A in the state

Here, . We won’t measure A right now, but apply the Hadamard transform on the left bit. This transforms A to the state

Now, if we measure the input bit, we deterministically get the integer 1 or 0 according as whether f is constant or not respectively. That’s it!

Deutsch’s algorithm solved a rather artificial problem, but it opened up the possibilities of exploring a new paradigm of computation. Till date, (good) quantum algorithms are known for many interesting computational problems. In the rest of this chapter, we concentrate on some of the quantum algorithms that have an impact in cryptology.

Exercise Set 8.2

8.1Let S be a finite set and let l2(S) denote the set of all functions .
  1. Show that l2(S) is a Hilbert space under the inner product

  2. Let , where δx(y) is 1 if y = x, and is 0 otherwise. Show that B is an orthonormal basis of l2(S).

8.2Show that the vectors and form an orthonormal -basis of .
8.3Show that is an entangled state of a 2-bit quantum register.
8.4Prove the following assertions.
  1. The matrix is unitary.

  2. A unitary matrix preserves inner product, that is, if U is an m × m unitary matrix and |ψ〉, , then .

  3. The determinant of a unitary matrix has absolute value 1.

  4. Every eigen value of a unitary matrix has absolute value 1.

  5. An m × m matrix A is unitary if and only if the columns of A constitute an orthonormal basis of (over ).

8.5
  1. Show that the following operators are unitary on a qubit. Also construct the corresponding transformation matrices.

    Identity operatorI|0〉 = |0〉, I|1〉 = |1〉.
    Exchange operatorX|0〉 = |1〉, X|1〉 = |0〉.
    Z operatorZ|0〉 = |0〉, Z|1〉 = –|1〉.
    Hadamard operator.

  2. Deduce the following identities:

  3. Let . Show that defines a unitary operator on a qubit and that , where the last X is the matrix of the exchange operator.

8.6Let A be an n-bit quantum register. Let us plan to number the bits of A as 1, . . . , n from left to right. One can apply the operators like X, Z, H of Exercise 8.5 on each individual bit of A. A qubit operation B applied on bit i of A will be denoted by Bi.
  1. Let Sij be the operator that swaps bit i with bit j. Show that

  2. Let C be the reversible XOR operation (also called the controlled-NOT operation) on a two-bit register A = (A1A2), that is, C|xy〉 = |x〉|xy〉. Show that C can be realized as

8.7Suppose that whenever you switch on your quantum computer, every bit in its registers is initialized to the state |0〉. Describe how you can use the operators I, X, Z and H defined in Exercise 8.5, in order to change the state of a qubit from |0〉 to the following:
  1. |1〉

  2. –|1〉

8.8Let A be an n-bit quantum register at the state |0|〉n. Show that the application of the Hadamard transform individually to each bit of A transforms A to the state . This is precisely the state of A in which all of the 2n possible outcomes in a measurement of A are equally likely. What happens if we apply H a second time individually to each bit of A, that is, what is H1H2 · · · Hn|ψ〉, where Hi denotes the Hadamard transform on the i-th bit of A?
8.9We know that any arithmetic or Boolean operation can be implemented using AND and NOT gates. This exercise suggests a reversible way to implement these operations. The Toffoli gate is a function T : {0, 1}3 → {0, 1}3 that maps (x, y, z) ↦ (x, y, zxy), where ⊕ means XOR, and xy means AND of x and y. Thus, T flips the third bit, if and only if the first two bits are both 1.
  1. Show that T is a unitary transformation on a 3-bit quantum register. What is the inverse of T?

  2. Use T to realize the Boolean AND and NOT operations.

8.3. Quantum Cryptography

We now describe the quantum key-exchange algorithm due to Bennett and Brassard. The original paper also talks about a practical implementation of the algorithm—an implementation using polarization of photons. For this moment, we do not highlight such specific implementation issues, but describe the algorithm in terms of the conceptual computational units called qubits.

The usual actors Alice and Bob want to agree upon a shared secret using communication over an insecure channel. A third party who gave her name as Carol plans to eavesdrop during the transmission. Alice and Bob repeat the following steps. Here, H stands for the Hadamard transform.

Algorithm 8.1. Quantum key-exchange algorithm

Alice generates a random classical bit .

Alice makes a random choice .

Alice computes the quantum bit A := Hx|i〉.

Alice sends A to Bob.

Bob makes a random choice .

Bob computes B := HyA.

Bob measures B to get the classical bit .

Bob sends y to Alice.

Alice sends x to Bob.

if (x = y) { Bob and Alice retains i = j }

The algorithm works as follows. Alice generates a random bit i and a random decision x whether she is going to use the Hadamard transform H. If x = 0, she sends the quantum bit |0〉 or |1〉 to Bob. If x = 1, she sends either or to Bob. At this point Bob does not know whether Alice applied H before the transmission. So Bob makes a random guess and accordingly skips/applies the Hadamard transform on the qubit received. If x = y = 0, then Bob has the qubit B = H0H0|i〉 = |i〉 and a measurement of this qubit reveals i with probability 1. On the other hand, if x = y = 1, then B = H2|i〉 = |i〉, since H2 is the identity transform (Exercise 8.5). In this case also, Bob retrieves Alice’s classical bit i with certainty by measuring B.

If xy, then B is generated from Alice’s initial choice |i〉 using a single application of H, that is, in this case. A measurement of this bit outputs 0 or 1, each with probability , that is, Bob gathers no idea about the initial choice of Alice. So after it is established that xy, they both discard the bit.

If we assume that x and y are uniformly chosen, Bob and Alice succeed in having x = y about half of the time. They eventually set up an n-bit secret after about 2n invocations of the above protocol. Table 8.1 illustrates a sample session between Alice and Bob. After 20 iterations of the above procedure, they agree upon the shared secret 0001110111.

Table 8.1. A sample session of the quantum key-exchange algorithm
IterationixAyBjCommon bit
10101 
200|0〉11 
3011|0〉00
40100 
51101 
600|0〉0|0〉00
700|0〉0|0〉00
810|1〉0|1〉11
900|0〉10 
101100 
110101 
1200|0〉10 
1310|1〉11 
14111|1〉11
15111|1〉11
16011|0〉00
17111|1〉11
1810|1〉0|1〉11
190100 
2010|1〉0|1〉11

What remains to explain is how this protocol guards against eavesdropping by Carol. Let us model Carol as a passive adversary who intercepts the qubit A transmitted by Alice, investigates the bit to learn about Alice’s secret i and subsequently transmits the qubit to Bob. In order to guess i, Carol mimics the role of Bob. At this point Carol does not know x, so she makes a guess z about x, accordingly skips/applies the Hadamard transform on the intercepted qubit in order to get a qubit C, measures C to get a bit value k and sends the measured qubit D to Bob. (Recall from Theorem 8.1 that it is impossible for Carol to make a copy of A, work on this copy and transmit the original qubit A to Bob.) Bob receives D, assumes that it is the qubit A transmitted by Alice and carries out his part of the work to generate the bit j. Bob and Alice later reveal x and y. If xy, they anyway reject the bits obtained from this iteration. Carol should also reject her bit k in this case. So let us concentrate only on the case that x = y. The introduction of Carol in the protocol changes A to D and hence Alice and Bob may eventually agree upon distinct bits. A sample session of the protocol in presence of Carol is illustrated in Table 8.2. The three parties generate the secret as:

Alice0110 0111 1000 1011
Bob0101 1101 1100 1011
Carol0100 0101 0100 1011

Table 8.2. Eavesdropping during a key-exchange session
IterationixAzC = HzAkDyB = HyDj
1011|0〉0|0〉10
210|1〉0|1〉1|1〉0|1〉1
310|1〉10|0〉0|0〉0
40100|0〉11
5011|0〉0|0〉11
6111|1〉1|1〉11
71100|0〉10
810|1〉0|1〉1|1〉0|1〉1
91100|0〉11
100101|1〉11
1100|0〉10|0〉0|0〉0
1200|0〉0|0〉0|0〉0|0〉0
13111|1〉1|1〉11
1400|0〉0|0〉0|0〉0|0〉0
1510|1〉0|1〉1|1〉0|1〉1
1610|1〉11|1〉0|1〉1

In this example, Alice and Bob’s shared secrets differ in five bit positions. Carol’s intervention causes a shared bit to differ with a probability of (Exercise 8.11). Thus, the more Carol eavesdrops, the more she introduces different bits in the secret shared by Alice and Bob.

Once Alice and Bob generate a shared secret of the desired bit length, they can check for the equality of their secret values without revealing them. For example, if the shared secret is a 64-bit DES key, Alice can send Bob one or more plaintext–ciphertext pairs generated by the DES algorithm using her shared key. Bob also generates the ciphertexts on Alice’s plaintexts using his secret key. If the ciphertexts generated by Bob differ from those generated by Alice, Bob becomes confident that their shared secrets are different and this happened because of the presence of some adversary (or because of communication errors). They then repeat the key-exchange protocol.

Another possible way in which Alice and Bob can gain confidence about the equality of their shared secrets is the use of parity checks. Suppose Alice breaks up her secret in blocks of eight bits and for each block computes the parity bit and sends these bits to Bob. Bob generates the parity bits on the blocks of his secret and compares the two sets of parity bits. If the shared secrets of Alice and Bob differ, it is revealed by this parity check with high probability.

A minor variant of the key-exchange algorithm just described comes with an implementation strategy. The polarization of a photon is measured by an angle θ, 0° ≤ θ < 180°.[1] A photon polarized at an angle θ passes through a φ-filter with the probability cos2(φ – θ) and gets absorbed in the filter with the probability sin2(φ – θ). Therefore, a photon polarized at the angles 0°, 90°, 45°, 135° can be used to represent the quantum states |0〉, |1〉, , respectively. Alice and Bob use 0°- and 45°-filters. Alice makes a random choice (x) among the two filters. If x = 0, she sends a photon polarized at an angle 0° or 90°. If x = 1, a photon polarized at an angle 45° or 135° is sent. When Bob receives the photon transmitted by Alice, he makes a random guess y. If y = 0, he uses the 0°-filter to detect its polarization, and if y = 1, he uses the 45°-filter to detect its polarization. Then, Alice and Bob reveal their choices x and y and if the two choices agree, they share a common secret bit. See Exercise 8.12 for a mathematical formulation of this strategy.

[1] Ask a physicist!

One of the most startling features of this Bennett–Brassard algorithm (often called the BB84 algorithm) is that there has been successful experimental implementations of the strategy. The first prototype was designed by the authors themselves in the T. J. Watson Research Center. They used a quantum channel of length 32 cm. Using longer channels requires many technological barriers to be overcome. For example, fiber optic cables tend to weaken and may even destroy the polarization of photons. Using boosters to strengthen the signal is impossible in the quantum mechanical world, since doing so produces an effect similar to eavesdropping. Interference pattern (instead of polarization) has been proposed and utilized to build longer quantum channels for key exchange. At present, Stucki et al. [293] hold the world record of performing quantum key exchange over an (underwater) channel of length 67 km between Geneva and Lausanne.

Exercise Set 8.3

8.10We have exploited the property that H2 = I in order to prove the correctness of the quantum key-exchange algorithm. Exercise 8.5 lists some other operators (X and Z) which also satisfy the same property (X2 = Z2 = I). Can one use one of these transforms in place of H in the quantum key-exchange algorithm?
8.11Assume that Carol eavesdrops (in the manner described in the text) during the execution of the quantum key-exchange protocol between Alice and Bob. Derive for different choices of i, x and z the following probabilities Pixz of having ij in the case x = y.
ixzPixz
0000
0011/2
0101/2
0111/2
1000
1011/2
1101/2
1111/2

If all these choices of i, x, z are equally likely, show that the probability that Carol introduces mismatch (that is, ij) in a shared bit during a random execution of the key-exchange protocol with x = y is 3/8.

(Note that if x = y = z = 0, that is, if the execution of the algorithm proceeds entirely in the classical sense, Carol goes unnoticed. It is the application of the classically meaningless Hadamard transform, that introduces the desired security in the protocol.)

8.12In the key-exchange algorithm described in the text, Bob (and also Carol) always measure qubits in the classical basis {|0〉, |1〉}. Now, consider the following variant of this algorithm. Alice sends, as before, one of the four qubits |0〉, |1〉, , depending on her choice of i and x. Bob upon receiving the qubit A generates a random guess . If y = 0, Bob measures A in the classical basis, whereas if y = 1, Bob measures A in the basis {H|0〉, H|1〉}. After this, they exchange x and y, and retain/discard the bits as in the original algorithm.
  1. Assume that there is no eavesdropping. Argue that this modified strategy works, that is, if x = y, we have i = j, whereas if xy, then i = j with probability .

  2. Explain the role of a passive adversary (Carol) in this modified strategy.

  3. Calculate for this variant the probability that Carol introduces an error in a shared bit (when x = y).

8.4. Quantum Cryptanalysis

The quantum parallelism has been effectively exploited to design fast (polynomial-time) algorithms to solve some of the intractable mathematical problems discussed in Chapter 4. With the availability of quantum computers, cryptographic systems that derive their security from the intractability of these problems will be unusable (completely insecure). Nobody, however, has the proof that these intractable problems cannot have fast classical algorithms. It is interesting to wait and see which (if any) is invented first, a quantum computer or a polynomial-time classical algorithm.

Let us set up some terminology for the rest of this chapter. Let P be a unitary operator on a qubit. One can apply P individually on the i-th bit of an n-bit register. In this case, we denote the operation by Pi. If Pi is operated for each i = 1, . . . , n (in succession or simultaneously), then we abbreviate P1 · · · Pn by the short-hand notation P(n). The parentheses distinguish the operation from Pn which is the n-fold application of P on a single qubit.

If P and Q are unitary transforms on n1- and n2-bit quantum registers respectively, we let PQ denote the unitary transform on an n1 + n2-bit register, with P operating on the left n1 bits and Q on the right n2 bits of the register.

8.4.1. Shor’s Algorithm for Computing Period

Let N := 2n for some . Let be a periodic function with (least) period r, that is, f(x + kr) = f(x) for every x, . Suppose further that 1 ≪ r ≤ 2n/2 and also that f(0), f(1), . . . , f(r – 1) are pairwise distinct. Shor proposed an algorithm for an efficient computation of the period r in this case.

Let’s first look at the problem classically. If one evaluates f at randomly chosen points, by the birthday paradox (Exercise 2.172) one requires evaluations of f on an average in order to find two different integers x and y with f(x) = f(y). But then r|(xy). If sufficiently many such pairs (x, y) are available, the period can be obtained by computing the gcd of the integers xy. If r is large, say, r = O(2n/2), this gives us an algorithm for computing r in expected time exponential in n. Shor’s quantum algorithm determines r in expected time polynomial in n.

Let us assume that we have an oracle Uf which, on input the 2n-bit value |xn|yn, computes |xn|f(x) ⊕ yn. We prepare a 2n-bit register A in the state |0〉n|0〉n. Then, we apply the Hadamard transform H(n) on the left n-bits. By Exercise 8.8, the state of A becomes

Supplying this state as the input to the oracle Uf yields the state

We then measure the output register (right n bits). By the generalized Born rule, we get a value for some and the state of the register A collapses to the uniform superposition of all those |x〉|f(x)〉 for which f(x) = f(x0). By the given periodicity properties of f, the post-measurement state of the input register (left n bits) can be written as

Equation 8.1


for some M determined by the relations:

x0 + (M – 1)r < Nx0 +Mr.

This is an interesting state, for if we were allowed to make copies of this state and measure the different copies, we could collect some values x0+j1r, . . . , x0+jkr, which in turn would reveal r with high probability. But the no-cloning theorem disallows making copies of quantum states. Shor proposed a trick to work around with this difficulty. He considered the following transform:

Equation 8.2


By Exercise 8.13, F is a unitary transform. F is known as the Fourier transform. Applying F to State (8.1) transforms the input register to the state

A measurement of this state gives an integer with the probability

Application of the Fourier transform to State (8.1) helps us to concentrate the probabilities of measurement outcomes in strategic states. More precisely, consider a value of y, where –1/2 ≤ ∊k < 1/2, that is, a value of y close to an integral multiple of N/r. In this case,

The last summation is that of a geometric series and we have

Now, we use the inequalities for 0 ≤ x ≤ π/2 and the facts that rMN and that to get

Since has about r positive integral multiples less than N and each such multiple has a closest integer yk for some k, the probability that we obtain one such yk as the outcome of the measurement is at least 4/π2 = 0.40528 . . . , that is, after O(1) iterations of the above procedure we get some yk. The Fourier transform increases the likelihood of getting some yk to a level bounded below by a positive constant.

What remains is to show that r can be retrieved from such a useful observation yk. We have . If a/b and c/d are two distinct rationals with b, and with and , then by the triangle inequality we have . On the other hand, since a/bc/d, we have , a contradiction. Therefore, since , there is a unique rational k/r satisfying , and this rational k/r can be determined by efficient classical algorithms, for example, using the continued fraction expansion[2] of yk/N.

[2] Consult Zuckerman et al. [316] to learn about continued fractions and their applications in approximating real numbers.

If gcd(k, r) = 1, we get r. We can verify this by checking if f(x) = f(x + r). If gcd(k, r) > 1, we get a factor of r. Repeating the entire procedure gives another k′/r, from which we get (hopefully) another factor of r (if not r itself). After a few (O(1)) iterations, we obtain r as the lcm of its factors obtained.

Much of the quantum magic is obtained by the use of the Fourier transform F on a suitably prepared quantum register. The question is then how easy it is to implement F. We will not go to the details, but only mention that a circuit consisting of basic quantum gates and of size O(n2) can be used to realize the Fourier transform (cf. Exercise 8.14).

To sum up, we have a polynomial-time (in n) randomized quantum algorithm for computing the period r of f. This leads to efficient quantum algorithms for solving many classically intractable problems of cryptographic significance.

8.4.2. Breaking RSA

Let m = pq with p, . We have φ(m) = (p – 1)(q – 1). Choose an RSA key pair (e, d) with gcd(e, φ(m)) = 1 and ed ≡ 1 (mod φ(m)). Given a message the ciphertext message is bae (mod m). The task of a cryptanalyst is to compute a from the knowledge of m, e and b. If gcd(b, m) > 1, then this gcd is a non-trivial factor of m. So assume that . But then also. Since bae (mod m), b is in the subgroup of generated by a. Similarly, abd (mod m), that is, a is in the subgroup of generated by b. It follows that these two subgroups are equal and, in particular, the multiplicative orders of a and b modulo m are the same. This order—call it r—divides φ(m) and hence is ≤ (p – 1)(q – 1) < m.

Choose with N := 2nm2 > r2. The function sending xbx (mod m) is periodic of (least) period r. By Shor’s algorithm, one computes r efficiently. Since gcd(e, φ(m)) = 1 and r|φ(m), we have gcd(e, r) = 1, that is, using the extended gcd algorithm one obtains an integer d′ with de ≡ 1 (mod r). But then bdadea (mod m).

The private key d is the inverse of e modulo φ(m). It is not necessary to compute d for decrypting b. The inverse d′ of e modulo r = ordm(a) = ordm(b) suffices.

8.4.3. Factoring Integers

Let m be a composite integer that we want to factor. Choose a non-zero integer . If gcd(a, m) > 1, then we already know a non-trivial factor of m. So assume that gcd(a, m) = 1, that is, . Let r := ordm(a).

As in the case of breaking RSA, choose with N := 2nm2 > r2. The function , xax (mod m), is periodic of least period r. Shor’s algorithm computes r. If r is even, we can write:

(ar/2 – 1)(ar/2 + 1) ≡ 0 (mod m).

Since ordm(a) = r, ar/2 – 1 ≢ 0 (mod m). If we also have ar/2 + 1 ≢ 0 (mod m), then gcd(ar/2 + 1, m) is a non-trivial factor of m. It can be shown that the probability of finding an even r with ar/2 + 1 ≢ 0 (mod m) is at least half (cf. Exercise 4.9). Thus, trying a few integers one can factor m.

8.4.4. Computing Discrete Logarithms

A variant of Shor’s algorithm in Section 8.4.1 can be used to compute discrete logarithms in the finite field , , . For the sake of simplicity, let us concentrate only on prime fields (s = 1). Let g be a generator of and our task is to compute for a given an integer with agr (mod p). We assume that p is a large prime, that is, p is odd.

Choose with N := 2n satisfying p < N < 2p. We use a 3n-bit quantum register A in which the left 2n bits constitute the input part and the right n bits the output part. The input part is initialized to the uniform superposition of all pairs , that is, A has the initial state:

(see Exercise 8.15). Then, we use an oracle

Uf : |xn|yn|zn ↦ |xn|yn|f(x, y) ⊕ zn

to compute the function f(x, y) := gxay (mod p) in the output register. Applying Uf transforms A to the state

Measurement of the output register now gives a value zgk (mod p) for some and causes the input register to jump to the state

Note that gxaygk (mod p) if and only if xryk (mod p – 1), that is, only those pairs (x, y) that satisfy this congruence contribute to the post-measurement state. For each value of y modulo p – 1, we get a unique xry + k (mod p – 1), that is, there are exactly p – 1 such pairs (x, y).

If we were allowed make copies of this state and observe two copies separately, we would get pairs (x1, y1) and (x2, y2) with x1ry1x2ry2k (mod p – 1). Now, if gcd(y1y2, p – 1) = 1, we would get r ≡ (y1y2)–1 (x1x2) (mod p – 1). But we are not allowed to copy quantum states. So Shor used his old trick, that is, applied the Fourier transforms

to obtain the state

A measurement of the input register at this state yields with probability:

Equation 8.3


As in Shor’s period-finding algorithm, we now require to identify a set of useful pairs (u, v) which are sufficiently many in number so as to make the probability of observing one of them bounded below by a positive constant. We also need to demonstrate how a useful pair can reveal the unknown discrete logarithm r of a. The jugglery with inequalities and approximations is much more involved in this case. Let us still make a patient attempt to see the end of the story.

First, we eliminate one of x, y from Equation (8.3). Since xry + k (mod p – 1) and 0 ≤ xp – 2, we have x = (ry + k) rem . But then . Let j be the integer closest to u(p – 1)/N, that is, u(p – 1) = jN + ∊ with , –N/2 < ∊ ≤ N/2. This yields

Equation 8.4


where

Equation 8.5


Since is an integer, substituting Equation (8.4) in Equation (8.3) gives

Writing S = lN + σ with –N/2 < σ ≤ N/2 then gives

We now impose the usefulness conditions on u, v:

Equation 8.6


Equation 8.7


Involved calculations show that the probability pu,v for a (u, v) satisfying these two conditions is at least . Let us now see how many pairs (u, v) satisfy the conditions. From Equation (8.5), it follows that for each u there exists a unique v, such that Condition (8.6) is satisfied. Condition (8.7), on the other hand, involves only u. If w := v2(p – 1), then 2w must divide ∊. For each multiple of 2w not exceeding N/12 in absolute value, we get 2w distinct solutions for u modulo N. (We are solving for u the congruence u(p – 1) ≡ ∊ (mod 2n).) There is a total of at least N/12 of them. Therefore, the probability of making any one of the useful observations (u, v) is at least , since N < 2p.

We finally explain the extraction of r from a useful observation (u, v). Condition (8.6) and Equation (8.5) give . Dividing throughout by N and using the fact that u(p – 1) = jN + ∊, we get

that is, the fractional part of must lie between and . The measurement of the input gives us v and we know N. We approximate to the nearest multiple of and get rj ≡ λ (mod p – 1). Now, j, being the integer closest to u(p – 1)/N, is also known to us. If gcd(j, p – 1) = 1, we have rj–1λ (mod p – 1). We don’t go into the details of determining the likelihood of the invertibility of j modulo p – 1. A careful analysis shows that Shor’s quantum discrete-log algorithm runs in probabilistic polynomial time (in n).

Exercise Set 8.4

8.13Let F be the Fourier Transform (8.2). For basis vectors |x〉 and |x′〉, show that

Conclude that F is a unitary transform.

8.14Let N = 2n. Let x, have binary expansions (xn–1 · · · x1x0)2 and (yn–1 · · · y1y0)2 respectively.
  1. Show that xy/N equals an integer plus the quantity

    yn–1 (.x0) + yn–2(.x1x0) + yn–3(.x2x1x0) + · · · + y0(.xn–1 xn–2 . . . x0),

    where .

  2. Deduce that the quantum Fourier Transform (8.2) can be written as

    where the i-th expression in parentheses applies to the i-th bit from the left.

8.15Let , N := 2n and . Consider an (n + 1)-bit quantum register with input consisting of the left n bits and the output the rightmost bit. Suppose there is an oracle Uf that takes an n-bit input x and outputs the bit:

First prepare the register in the state . Then, apply Uf on this register and finally measure the output bit. Describe the state of the input register after this measurement depending on the outcome of the measurement.

8.16Recall that the Fourier Transform (8.2) is defined for N equal to a power of 2. It turns out that for such values of N the quantum Fourier transform is easy to implement. For this exercise, assume hypothetically that one can efficiently implement F for other values of N too. In particular, take N = p – 1 in Shor’s quantum discrete-log algorithm. Show that in this case, the probability pu,v of Equation (8.3) becomes:

Conclude that an outcome (u, v) of measuring the input register yields

r ≡ –u–1v (mod p – 1),

provided gcd(u, p – 1) = 1.

Chapter Summary

This chapter is a gentle introduction to the recent applications of quantum computation in public-key cryptography. These developments have both good and bad impacts for cryptologers. It is still a big question whether a quantum computer can ever be manufactured. So at present a study of quantum cryptology is mostly theoretical in nature.

Quantum mechanics is governed by a set of four axioms that define a system and prescribe the properties of a system. A quantum bit (qubit) is a quantum mechanical system that has two orthogonal states |0〉 and |1〉. A quantum register is a collection of qubits of a fixed size.

As an example of what we can gain by using quantum algorithms, we first describe the Deutsch algorithm that determines whether a function f : {0, 1} → {0, 1} is constant by invoking f only once. A classical algorithm requires two invocations.

Next we present the BB84 algorithm for key exchange over a quantum mechanical channel. The algorithm guarantees perfect security. This algorithm has been implemented in hardware, and key agreement is carried out over a channel of length 67 km.

Finally, we describe Shor’s polynomial-time quantum algorithms for factoring integers and for computing discrete logarithms in finite fields. These algorithms are based on a technique called quantum Fourier transform.

If quantum computers can ever be realized, RSA and most other popular cryptosystems described and not described in this book will forfeit all security guarantees. And what will happen to this book? If you don’t possess a copy of this wonderful book, just rush to your nearest book store now—they have not yet mastered the quantum technology!

Suggestions for Further Reading

There was a time when the newspapers said that only twelve men understood the theory of relativity. I do not believe there ever was such a time . . . On the other hand, I think I can safely say that nobody understands quantum mechanics.

—Richard Feynman, The Character of Physical Law, BBC, 1965

Quantum mechanics came into existence, when Werner Heisenberg, at the age of 25, proposed the uncertainty principle in 1927. It created an immediate stir in the physics community. Eventually Heisenberg and Niels Bohr came up with an interpretation of quantum mechanics, known as the Copenhagen interpretation. While many physicists (like Max Born, Wolfgang Pauli and John von Neumann) subscribed to this interpretation, many other eminent ones (including Albert Einstein, Erwin Schrödinger, Max Planck and Bertrand Russell) did not. Interested readers may consult textbooks by Sakurai [255] and Schiff [258] to study this fascinating area of fundamental science.[3]

[3] Well! We are not physicists. These books are followed in graduate and advanced undergraduate courses in many institutes and universities.

For a comprehensive treatment of quantum computation (including cryptographic and cryptanalytic quantum algorithms), we refer the reader to the book by Nielsen and Chuang [218]. Mermin’s paper [197] and course notes [198] are also good sources for learning quantum mechanics and computation, and are suitable for computer scientists. Preskill’s course notes [244] are also useful, though a bit more physics-oriented. The very readable article [243] by Preskill on the realizability of quantum computers is also worth mentioning in this context. The first known quantum algorithm is due to Deutsch [75].

Bennett and Brassard’s quantum key-exchange algorithm (BB84) appeared in [20]. The implementation due to Stucki et al. of this algorithm is reported in [293].

Shor’s polynomial-time quantum factorization and discrete-log algorithms are described in [271]. All the details missing in Section 8.4.4 can be found in this paper. No polynomial-time quantum algorithms are known to solve the elliptic curve discrete logarithm problem. Proos and Zalka [245] present an extension of Shor’s algorithm for a special class of elliptic curves. See [146] for an adaptation of this algorithm applicable to fields of characteristic 2.

Appendices

 


A. Symmetric Techniques

A.1Introduction
A.2Block Ciphers
A.3Stream Ciphers
A.4Hash Functions

Sour, sweet, bitter, pungent, all must be tasted.

—Chinese Proverb

Unless we change direction, we are likely to end up where we are going.

—Anonymous

Not everything that can be counted counts, and not everything that counts can be counted.

—Albert Einstein

A.1. Introduction

Cryptography, today, cannot bank solely on public-key (that is, asymmetric) algorithms. Secret-key (that is, symmetric) techniques also have important roles to play. This chapter is an attempt to introduce to the readers some rudimentary notions about symmetric cryptography. The sketchy account that follows lacks both the depth and the breadth of a comprehensive treatment. Given the focus of this book, Appendix A could have been omitted. Nonetheless, some attention to the symmetric technology is never irrelevant for any book on cryptology.

It remains debatable whether hash functions can be treated under the banner of this chapter—a hash function need not even use a key. If the reader is willing to accept symmetric as an abbreviation for not asymmetric, some justifications can perhaps be given. How does it matter anyway?

A.2. Block Ciphers

Block ciphers encrypt plaintext messages in blocks of fixed lengths and are more ubiquitously used than public-key encryption routines. In a sense, public-key encryption is also block encryption. Since public-key routines are much slower than (secret-key) block ciphers, it is a custom to use public-key algorithms only in specific situations, for example, for encrypting single blocks of data, like keys of symmetric ciphers.

In the rest of this chapter, we use the word bit in the conventional sense, that is, to denote a quantity that can take only two possible values, 0 and 1. It is convenient to use the symbol to refer to the set {0, 1}. We also let stand for the set of all bit strings of length m. Whenever we plan to refer to the field (or group) structure of , we will use the alternative notation .

Definition A.1.

A block cipher f of block-size n and of key-size r is a map

that encrypts a plaintext block m of bit length n to a ciphertext block c of bit length n under a key K, a bit string of length r. To ensure unique decryption, the map

for a fixed key K has to be a permutation of (that is, a bijective function on) . In that case, the decryption of c to get back m is carried out as .

A good block cipher has the following desirable properties:

A block cipher provably possessing all these good characteristics (in particular, the randomness properties) is difficult to construct in practice. Practical block ciphers are manufactured for reasonably big n and r and come with the hope of representing reasonably unpredictable permutations. We dub a block cipher good or safe, if it stands the test of time. Table A.1 lists some widely used block ciphers.

Table A.1. Some popular block ciphers
Namenr
DES (Data Encryption Standard)6456
FEAL (Fast Data Encipherment Algorithm)6464
SAFER (Secure And Fast Encryption Routine)6464
IDEA (International Data Encryption Algorithm)64128
Blowfish64≤ 448
Rijndael, accepted as AES (Advanced Encryption Standard) by NIST (National Institute of Standards and Technology, a US government organization)128/192/256128/192/256

A.2.1. A Case Study: DES

The data encryption standard (DES) was proposed as a federal information processing standard (FIPS) in 1975. DES has been the most popular and the most widely used among all block ciphers ever designed. Although its relatively small key-size offers questionable security under today’s computing power, DES still enjoys large-scale deployment in not-so-serious cryptographic applications.

DES encryption requires a 64-bit plaintext block m and a 56-bit key K.[1] Let us plan to use the notations DESK and to stand respectively for DES encryption and decryption functions under the key K.

[1] A DES key K = k1k2 . . . k64 is actually a 64-bit string. Only 56 bits of K are used for encryption. The remaining 8 bits are used as parity-check bits. Specifically, for each i = 1, . . . , 8 the bit k8i is adjusted so that the i-th byte (k8i – 7k8i – 6 . . . k8i) has an odd number of one-bits.

DES key schedule

The DES algorithm first computes sixteen 48-bit keys K1, K2, . . . , K16 from K using a procedure known as the DES key schedule described in Algorithm A.1. These 16 keys are used in the 16 rounds of encryption. The key schedule uses two fixed permutations PC1 and PC2 described after Algorithm A.1 and to be read in the row-major order. Here, PC is an abbreviation for permuted choice.

Algorithm A.1. The DES key schedule

Input: A DES key K = k1k2 . . . k64 (containing the parity-check bits).

Output: Sixteen 48-bit round keys K1, K2, . . . , K16.

Steps:

Use PC1 to generate .
Write U0 = C0 ‖ D0 with C0.
for i = 1, 2, . . . ,16 {
   Take 
   Cyclically left shift Ci – 1 by s bits to get Ci.
   Cyclically left shift Di – 1 by s bits to get Di.
   Let .
   Compute the i-th round key Ki := PC2(Ui) = u14u17u11 . . . u29u32.
}

PC1
5749413325179
1585042342618
1025951433527
1911360524436
63554739312315
7625446383022
1466153453729
211352820124

PC2
1417112415
3281562110
2319124268
1672720132
415231374755
304051453348
444939563453
464250362932

DES encryption

DES encryption, as described in Algorithm A.2, proceeds in 16 rounds. The i-th round uses the key Ki (obtained from the key schedule) in tandem with the encryption primitive e. A fixed permutation IP and its inverse IP–1 are also used.[2]

[2] A block cipher that executes several encryption rounds with the i-th round computing the two halves as Li := Ri – 1 and Ri := Li – 1e(Ri – 1, Ki) for some round key Ki and for some encryption primitive e, is called a Feistel cipher. Most popular block ciphers mentioned earlier are of this type. Rijndael is an exception, and its acceptance as the new standard has been interpreted as an end of the Feistel dynasty.

It requires a specification of the round encryption function e to complete the description of DES encryption. The function e can be compactly depicted as:

e(X, J) := P(S(E(X) ⊕ J)),

Algorithm A.2. DES encryption

Input: Plaintext block m = m1m2 . . . m64 and the round keys K1, . . . , K16.

Output: The ciphertext block .

Steps:

Apply the initial permutation on m to get
     
Write V = L0 ‖ R0 with L0.
for i = 1, 2, . . . , 16 {
   /* The i-th encryption round */
   Li := Ri – 1.
   Ri := Li – 1 ⊕ e(Ri – 1Ki).
}
Let .
Apply the inverse of the initial permutation on W to get the ciphertext block
   .

IP
585042342618102
605244362820124
625446383022146
645648403224168
57494133251791
595143352719113
615345372921135
635547393123157

IP–1
408481656246432
397471555236331
386461454226230
375451353216129
364441252206028
353431151195927
342421050185826
33141949175725

where is an expansion function, is a contraction function and P is a fixed permutation of (called the permutation function). S uses eight S-boxes (substitution boxes) S1, S2, . . . , S8. Each S-box Sj is a 4 × 16 matrix with each row a permutation of 0, 1, 2, . . . , 15 and is used to convert a 6-bit string y1y2y3y4y5y6 to a 4-bit string z1z2z3z4 as follows. Let μ denote the integer with binary representation y1y6 and ν the integer with binary representation y2y3y4y5. Then, z1z2z3z4 is the 4-bit binary representation of the μ, ν-th entry in the matrix Sj. (Here, the numbering of the rows and columns starts from 0.) In this case, we write Sj(y1y2y3y4y5y6) = z1z2z3z4. Algorithm A.3 provides the description of e.

Algorithm A.3. The DES round encryption primitive e

Input: and .

Output: e(X, J).

Steps:

Y := E(X) ⊕ J (where E(x1x2 . . . x32) = x32x1x2 . . . x32x1).
Write Y = Y1 ‖ Y2 ‖ . . . ‖ Y8 with each .
for 
.
 (where P(z1z2 . . . z32) = z16z7z20 . . . z4z25).

The tables for E and P are as follows.

E
3212345
456789
8910111213
121314151617
161718192021
202122232425
242526272829
28293031321

P
1672021
29122817
1152326
5183110
282414
322739
1913306
2211425

Finally, the eight S-boxes are presented:

S1
1441312151183106125907
0157414213110612119538
4114813621115129731050
1512824917511314100613

S2
1518146113497213120510
3134715281412011069115
0147111041315812693215
1381013154211671205149

S3
1009146315511312711428
1370934610285141211151
1364981530111212510147
1101306987415143115212

S4
7131430691012851112415
1381156150347212110149
1069012117131513145284
3150610113894511127214

S5
2124171011685315130149
1411212471315015103986
4211110137815912563014
1181271142136150910453

S6
1211015926801334147511
1015427129561131401138
9141552812370410113116
4321295151011141760813

S7
4112141508133129751061
1301174911014351221586
1411131237141015680592
6111381410795015142312

S8
1328461511110931450127
1151381037412561101492
7114191214206101315358
2114741081315129035611

DES decryption

DES decryption is analogous to DES encryption. To obtain one first computes the round keys K1, K2, . . . , K16 using Algorithm A.1. One then calls a minor variant of Algorithm A.2. First, the roles of m and c are interchanged. That is, one inputs c instead of m, and obtains m in place of c as output. Moreover, the right half Ri in the i-th round is computed as Ri := Li – 1e(Ri – 1, K17 – i). In other words, DES decryption is same as DES encryption, only with the sequence of using the keys K1, K2, . . . , K16 reversed. Solve Exercise A.1 in order to establish the correctness of this decryption procedure.

DES test vectors

Some test vectors for DES are given in Table A.2.

Table A.2. DES test vectors
KeyPlaintext blockCiphertext block
010101010101010100000000000000008ca64de9c1b123a7
fefefefefefefefeffffffffffffffff7359b2163e4edc58
31010101010101011000000000000001958e6e627a05557B
10101010101010101111111111111111f40379ab9e0ec533
0123456789abcdef111111111111111117668dfc7292532d
10101010101010100123456789abcdef8a5ae1f81ab8f2dd
fedcba98765432100123456789abcdefed39d950fa74bcc4

Cryptanalysis of DES

DES, being a popular block cipher, has gone through a good amount of cryptanalytic studies. At present, linear cryptanalysis and differential cryptanalysis are the most sophisticated attacks on DES. But the biggest problem with DES is its relatively small key size (56 bits). An exhaustive key search for a given plaintext–ciphertext pair needs carrying out a maximum of 256 encryptions in order to obtain the correct key. But how big is this number 256 = 72,057,594,037,927,936 (nearly 72 quadrillion) in a cryptographic sense?

In order to review this question, RSA Security Inc. posed several challenges for obtaining the DES key from given plaintext–ciphertext pairs. The first challenge, posed in January 1997, was broken by Rocke Verser of Loveland, Colorado, with approximately 96 days of computing. DES Challenge II-1 was broken in February 1998 by distributed.net with 41 days of computing, and the DES challenge II-2 was cracked in July 1998 by the Electronic Frontier Foundation (EFF) in just 56 hours. Finally, DES Challenge III was broken in a record of 22 hours 15 minutes in January 1999. The computations were carried out in EFF’s supercomputer Deep Crack with collaborative efforts from nearly 105 PCs on the Internet guided by distributed.net. These figures demonstrate that DES offers hardly any security against a motivated adversary.

Another problem with DES is that its design criteria (most importantly, the objectives behind choosing the particular S-boxes) were never made public. Chances remain that there are hidden backdoors, though none has been discovered till date.

A.2.2. The Advanced Standard: AES

The advanced encryption standard (AES) [219] has superseded the older standard DES. The Rijndael cipher designed by Daemen and Rijmen has been accepted as the advanced standard. As mentioned in Footnote 2, Rijndael is not a Feistel cipher. Its working is based on the arithmetic in the finite field and in the finite ring .

Data representation

AES encrypts data in blocks of 128 bits. Let B = b0b1 . . . b127 be a block of data, where each bi is a bit. Keeping in view typical 32-bit processors, each such block B is represented as a sequence of four 32-bit words, that is, B = B0B1B2B3, where Bi represents the bit string b32ib32i+1 . . . b32i+31. Each word C = c0c1 . . . c31, in turn, is viewed as a sequence of four octets, that is, C = C0C1C2C3, where Ci stores the bit string c8ic8i+1 . . . c8i+7. Each octet is identified as an element of , whereas an entire 32-bit word is identified with an element of .

The field is represented as , where f(X) is the irreducible polynomial X8 + X4 + X3 + X + 1. Let x := X + 〈f(X)〉. The element is identified with the octet d7d6 . . . d1d0. Thus, the i-th octet c8ic8i+1 . . . c8i+7 in a word is treated as the finite field element .

Now, let us explain the interpretation of a 32-bit word C = C0C1C2C3. The -algebra is not a field, since the polynomial Y4 + 1 is reducible (over and so over ). However, each element β of A can be uniquely expressed as a polynomial β = α3y3 + α2y2 + α1y + α0, where y := Y + 〈Y4 + 1〉 and where each αi is an element of . As described in the last paragraph, each αi is represented as an octet. We take Ci to be the octet representing α3 – i, that is, the 32-bit word α3α2α1α0 stands for the element .

and A are rings and hence equipped with arithmetic operations (addition and multiplication). These operations are different from the usual addition and multiplication operations defined on octets and words. For example, the addition of two octets or words under the AES interpretation is the same as bit-wise XOR of octets or words. The AES multiplication of octets and words, on the other hand, involves polynomial arithmetic and reduction modulo the defining polynomials and so cannot be expressed so simply as addition. To resolve ambiguities, let us plan to denote the multiplication of by ⊙ and that of A by ⊗, whereas regular multiplication symbols (·, × and juxtaposition) stand for the standard multiplication on octets or words. Exercises A.5, A.6 and A.7 discuss about efficient implementations of the arithmetic in and A.

Every non-zero element is invertible; the inverse is denoted by α–1 and can be computed by the extended gcd algorithm on polynomials over . With an abuse of notation, we take 0–1 := 0. Every non-zero element of A is not invertible (under the multiplication of A). The AES algorithm uses the following invertible element β := 03010102 (in hex notation); its inverse is β–1 = 0b0d090e.

The AES algorithm uses an object called a state, comprising 16 octets arranged in a 4 × 4 array. Each message block also consists of 16 octets. Let M = μ0μ1 . . . μ15 be a message block (of 16 octets). This block is translated to a state as follows:

Equation A.1


Thus, each word in the block is relocated in a column of the state. At the end of the encryption procedure, AES makes the reverse translation of a state to a block:

Equation A.2


AES key schedule

A collection of round keys is generated from the given AES key K. The number of rounds of the AES encryption algorithm depends on the size of the key. Let us denote the number of words in the AES key by Nk and the corresponding number of rounds by Nr. We have:

One first generates an initial 128-bit key K0K1K2K3. Subsequently, for the i-th round, 1 ≤ iNr, a 128-bit key K4iK4i+1K4i+2K4i+3 is required. Here, each Kj is a 32-bit word. The key schedule (also called key expansion) generates a total of 4(Nr + 1) words K0, K1, . . . , K4Nr+3 from the given secret key K using a procedure described in Algorithm A.4. Here, (02)j – 1 stands for the octet that represents the element . The following table summarizes these values for j = 1, 2, . . . , 15.

j123456789101112131415
xj – 101020408102040801b366cd8ab4d9a

The transformation SubWord on a word T = τ0τ1τ2τ3 is the octet-wise application of AES S-box substitution SubOctet, that is,

SubWord(T) = SubOctet(τ0) ‖ SubOctet(τ1) ‖ SubOctet(τ2) ‖ SubOctet(τ3).

Algorithm A.4. AES key schedule

Input: (Nk and) the secret key K = κ0κ1 ... κ4Nk – 1, where each κi is an octet.

Output: The expanded keys K0, K1, . . . , K4Nr+3.

Steps:

/* Initially copy the bytes of K */
for i = 0, 1, . . . , Nk – 1 { Ki := κ4iκ4i+1κ4i+2κ4i+3. }

/* Recursively define the round keys */
for i = NkNk + 1, . . . , 4Nr + 3 {
      T := Ki – 1;       /* T is a temporary word variable. */
      /* Let T = τ0τ1τ2τ3where each τi is an octet. */
      if (i rem Nk = 0) { T := SubWord(τ1τ2τ3τ0) ⊕ [(02)(i/Nk) – 1‖000000]. }
      else if (Nk > 6) and (i rem Nk = 4) { T := SubWord(T). }
      Ki := KiNk ⊕ T.
}

The transformation SubOctet is also used in each encryption round and is now described. Let A = a0a1 . . . a7 be an octet that can be identified with an element of as mentioned earlier. Let B = b0b1 . . . b7 denote the octet representing the inverse of this finite field element. (We take 0–1 = 0.) One then applies the following affine transformation on B to generate the final value C := SubOctet(A) := c0c1 . . . c7. Here, D = d0d1 . . . d7 is the constant octet 63 = 01100011.

Equation A.3


In order to speed up this octet substitution, one may use table lookup. Since the output octet C depends only on the input octet A, one can precompute a table of values of SubOctet(A) for the 256 possible values of A. This list is given in Table A.3. The table is to be read in the row-major fashion. In other words, if hi and lo respectively represent the most and the least significant four bits of A, then SubOctet(A) can be read off from the entry in the table having row number hi and column number lo. For example, SubOctet(a7) = 5c. In an actual implementation, a one-dimensional array is to be used. We use a two-dimensional format in Table A.3 for the sake of clarity of presentation.

Table A.3. AES S-box
 0123456789abcdef
0637c777bf26b6fc53001672bfed7ab76
1ca82c97dfa5947f0add4a2af9ca472c0
2b7fd9326363ff7cc34a5e5f171d83115
304c723c31896059a071280e2eb27b275
409832c1a1b6e5aa0523bd6b329e32f84
553d100ed20fcb15b6acbbe394a4c58cf
6d0efaafb434d338545f9027f503c9fa8
751a3408f929d38f5bcb6da2110fff3d2
8cd0c13ec5f974417c4a77e3d645d1973
960814fdc222a908846eeb814de5e0bdb
ae0323a0a4906245cc2d3ac629195e479
be7c8376d8dd54ea96c56f4ea657aae08
cba78252e1ca6b4c6e8dd741f4bbd8b8a
d703eb5664803f60e613557b986c11d9e
ee1f8981169d98e949b1e87e9ce5528df
f8ca1890dbfe6426841992d0fb054bb16

AES encryption

AES encryption is described in Algorithm A.5. The algorithm first converts the input plaintext message block to a state, applies a series of transformations on this state and finally converts the state back to a message (the ciphertext).

The individual state transition transformations are now explained. The transition SubState is an octet-by-octet application of the substitution function SubOctet, that is, SubState maps

where for all r, c. The transform ShiftRows cyclically left rotates the r-th row by r byte positions, that is, maps

The AddKey operation uses four 32-bit round keys L0, L1, L2, L3. Name the octets of Li as λi0λi1λi2λi3. The i-th key Li is XORed with the i-th column of the state, that is, AddKey transforms

Finally, the MixCols transform multiplies each column of the state, regarded as an element of , by the element , where the coefficients (expressions within square brackets) are octet values in hexadecimal, that can be identified with elements of . For the c-th column, this transformation can be represented as:

Algorithm A.5. AES encryption

Input: The plaintext message M = μ0μ1 . . . μ15 and the round keys K0, K1, . . . , K4Nr+3.

Output: Ciphertext message C = γ0γ1 . . . γ15.

Steps:

Convert M to the state S.                                      /* Use Transform (A.1) */
S := AddKey(SK0K1K2K3).
for i = 1, 2, . . . , Nr {
      S := SubState(S).
      S := ShiftRows(S).
      if (i ≠ Nr) { S := MixCols(S). }
      S := AddKey(SK4iK4i+1K4i+2K4i+3).
}
Convert S to the message C.                                /* Use Transform (A.2) */

AES decryption

AES decryption involves taking inverse of each state transition performed during encryption. The key schedule needed for encryption is to be used during decryption too. The straightforward decryption routine is given in Algorithm A.6.

Algorithm A.6. AES decryption

Input: The ciphertext message C = γ0γ1 . . . γ15 and the round keys K0, K1, . . . , K4Nr+3.

Output: The recovered plaintext message M = μ0μ1 . . . μ15.

Steps:

Convert C to the state S.                                      /* Use Transform (A.1) */
S := AddKey(SK4NrK4Nr+1K4Nr+2K4Nr+3).
for i = Nr – 1, Nr – 2, . . . , 1, 0 {
      S := ShiftRows–1(S).
      S := SubState–1(S).
      S := AddKey(SK4iK4i+1K4i+2K4i+3).
      if (i ≠ 0) { S := MixCols–1(S). }
}
Convert S to the message M.                                /* Use Transform (A.2) */

What remains is a description of the inverses of the basic state transformations. AddKey involves octet-by-octet XORing and so is its own inverse. Table A.4 summarizes the inverse of the substitution transition SubOctet (Exercise A.8). For computing SubState–1(S), one should apply SubOctet–1 on each octet of S. The inverse of ShiftRows is also straightforward and can be given by

Finally, MixCols–1 involves multiplication of each column by the inverse of the element , that is, by the element [0b]y3 + [0d]y2 + [09]y + [0e]. So MixCols–1 transforms each column of the state as follows:

Table A.4. Inverse of AES S-box
 0123456789abcdef
052096ad53036a538bf40a39e81f3d7fb
17ce339829b2fff87348e4344c4dee9cb
2547b9432a6c2233dee4c950b42fac34e
3082ea16628d924b2765ba2496d8bd125
472f8f66486689816d4a45ccc5d65b692
56c704850fdedb9da5e154657a78d9d84
690d8ab008cbcd30af7e45805b8b34506
7d02c1e8fca3f0f02c1afbd0301138a6b
83a9111414f67dcea97f2cfcef0b4e673
996ac7422e7ad3585e2f937e81c75df6e
a47f11a711d29c5896fb7620eaa18be1b
bfc563e4bc6d279209adbc0fe78cd5af4
c1fdda8338807c731b11210592780ec5f
d60517fa919b54a0d2de57a9f93c99cef
ea0e03b4dae2af5b0c8ebbb3c83539961
f172b047eba77d626e169146355210c7d

AES decryption is as efficient as AES encryption, since each state transformation primitive has the same structure as its inverse. However, the sequence of application of these primitives in the loop (rounds) for decryption differs from that for encryption. For some implementations, mostly in hardware, this may be a problem. Compare this with DES for which the encryption and decryption algorithms are identical save the sequence of using the round keys (Exercise A.1). With little additional effort AES can also be furnished with this useful property of DES. All we have to do is to use a different key schedule for decryption. The necessary modifications are explored in Exercise A.9.

AES test vectors

Table A.5 provides the ciphertexts for the plaintext block

M = 00112233445566778899aabbccddeeff

under different keys.

Table A.5. AES test vectors
CipherKeyCiphertext block
AES-1280001020304050607 \ 08090a0b0c0d0e0f69c4e0d86a7b0430 \ d8cdb78070b4c55a
AES-1920001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617dda97ca4864cdfe0 \ 6eaf70a0ec0d7191
AES-2560001020304050607 \ 08090a0b0c0d0e0f \ 1011121314151617 \ 18191a1b1c1d1e1f8ea2b7ca516745bf \ eafc49904b496089

Cryptanalysis of AES

AES has been designed so that linear and differential attacks are infeasible. Another attack known as the square attack has been proposed by Lucks [184] and Ferguson et al. [93], but at present can tackle less number of rounds than used in Rijndael encryption. Also see Gilbert and Minier [112] to know about the collision attack.

The distinct algebraic structure of AES encryption invites special algebraic attacks. One such potential attack (the XSL attack) has been proposed by Courtois and Pieprzyk [68]. Although this attack has not yet been proved to be effective, a better understanding of the algebra may, in foreseeable future, lead to disturbing consequences for the advanced standard.

For more information on AES, read the book [71] from the designers of the cipher. Also visit the following Internet sites:

http://www.esat.kuleuven.ac.be/~rijmen/rijndael/Rijndael home
http://csrc.nist.gov/CryptoToolkit/aes/index1.htmlNIST site for AES
http://www.cryptosystem.net/aes/Algebraic attacks

A.2.3. Multiple Encryption

Multiple encryption presents a way to achieve a desired level of security by using block ciphers of small key sizes. The idea is to cascade several stages of encryption and/or decryption, with different stages working under different keys. Figure A.1 illustrates double and triple encryption for a block cipher f. Each gi or hj represents either the encryption or the decryption function of f under the given key.

Figure A.1. Multiple encryption


For double encryption, we have K1K2 and both g1 and g2 are usually the encryption function. Unless fK2 ο fK1 is the same as fK for some key K and if the permutations of f are reasonably random, it appears at the first glance that double encryption increases the effective key size by a factor of two. Unfortunately, this is not the case. The meet-in-the-middle attack on double encryption works as follows.

Suppose that an adversary knows a plaintext–ciphertext pair (m, c) under the unknown keys K1, K2. We assume as before that f has block-size n and key-size r. The adversary computes for each possibility of the encrypted message xi := fi(m). She also computes for each the decrypted message . Now, (i, j) is a possible value of (K1, K2) if and only if .

A given pair (m, c) usually gives many such candidates (i, j) for (K1, K2). More precisely, if each is assumed to be a random permutation of , for a given i we have the equality for an expected number of 2r/2n values of j. Considering all possibilities for i gives an expected number of 2r × 2r/2n = 22rn candidate pairs (i, j). If f = DES, this number is 22 × 56–64 = 248.

If a second pair (m′, c′) under (K1, K2) is also known to the adversary, then for a given i the pair (i, j) is consistent with both (m, c) and (m′, c′) with probability 2r/(2n × 2n). Thus, we get an expected number of (2r × 2r)/(2n × 2n) = 22r – 2n candidates (i, j). For DES, this number is 2–16. This implies that it is very unlikely that a false candidate (i, j) satisfies both (m, c) and (m′, c′). Thus, with high probability the adversary uniquely identifies the double DES key (K1, K2) from two plaintext–ciphertext pairs.

This attack calls for O(2r) encryptions and O(2r) decryptions. With the assumption that each encryption takes roughly the same time as each decryption (as in the case of DES), the adversary spends a time for O(2r) encryptions. Moreover, she can find all the matches in O(r2r) time. This implies that double encryption increases the effective key size (over single encryption) by a few bits only. On the other hand, both the actual key size and the encryption time get doubled. In view of these shortcomings, double encryption is rarely used in practice.

For the triple encryption scheme of Figure A.1, a meet-in-the-middle attack at x or y demands an effort equivalent to O(22r) encryptions, that is, the effective key size gets doubled. It is, therefore, customary to take K1 = K3 and K2 different from this common value. The actual key size also gets doubled with this choice—one doesn’t have to remember K3 separately. It is also a common practice to take h1 and h3 the encryption function (under K1 = K3) and h2 the decryption function (under K2). One often calls this particular triple encryption an E-D-E scheme.

A.2.4. Modes of Operation

In practice, the length of the message m to be encrypted need not equal the block length n of the block cipher f. One then has to break up m into blocks of some fixed length n′ ≤ n and encrypt each block using the block cipher. In order to make the length of m an integral multiple of n′, one may have to pad extra bits to m (say, zero bits at the end). It is often necessary to store the initial size of m in a separate block, say, after the last message block. In what follows, we shall assume that the input message m gives rise to l blocks m1, m2, . . . , ml each of size n′. The corresponding ciphertext blocks c1, c2, . . . , cl will also be of bit length n′ each. The reason for choosing the block size n′ ≤ n will be clear soon.

The ECB mode

The easiest way to encrypt multiple blocks m1, . . . , ml is to take n′ = n and encrypt each block mi as ci := fK(mi). Decryption is analogous: . This mode of operation of a block cipher is called the electronic code-book or the ECB mode. Algorithms A.7 and A.8 describe this mode.

Algorithm A.7. ECB encryption

Input: The plaintext blocks m1, . . . , ml and the key K.

Output: The ciphertext c = c1 . . . cl.

Steps:

for i = 1, . . . , l { ci := fK(mi) }

Algorithm A.8. ECB decryption

Input: The ciphertext blocks c1, . . . , cl and the key K.

Output: The plaintext m = m1 . . . ml.

Steps:

for

In this mode, identical message blocks encrypt to identical ciphertext blocks (under the same key), that is, partial information about the plaintext may be leaked out. The following three modes overcome this problem.

The CBC mode

In the cipher-block chaining or the CBC mode, one takes n′ = n and each plaintext block is first XOR-ed with the previous ciphertext block and then encrypted. In order to XOR the first plaintext block, one needs an n-bit initialization vector (IV). The IV need not be kept secret and may be sent along with the ciphertext blocks.

Algorithm A.9. CBC encryption

Input: The plaintext blocks m1, . . . , ml, the key K and the IV.

Output: The ciphertext c = c1 . . . cl.

Steps:

c0 := IV.

for i = 1, . . . , l { ci := fK(mici – 1). }

Algorithm A.10. CBC decryption

Input: The ciphertext blocks c1, . . . , cl, the key K and the IV.

Output: The plaintext m = m1 . . . ml.

Steps:

c0 := IV.

for

The CFB mode

In the cipher feedback or the CFB mode, one chooses . In this mode, the plaintext blocks are not encrypted, but masked by XOR-ing with a stream of random keys generated from a (not necessarily secret) n-bit IV. In this sense, the CFB mode works like a stream cipher (see Section A.3).

Algorithm A.11. CFB encryption

Input: The plaintext blocks m1, . . . , ml, the key K and the IV.

Output: The ciphertext c = c1 . . . cl.

Steps:

k0 := IV.   /* Initialize the key stream */
for i = 1, . . . , l {
   /* Mask the current key by block encryption and the message by XOR-ing */
   ci := mi ⊕ msbn′ (fK(ki – 1)).
   /* Generate the next key from the previous key and the current ciphertext block */
   ki := lsbnn (ki – 1) ‖ ci.
}

Algorithm A.11 explains CFB encryption. The notation msbk(z) (resp. lsbk(z)) stands for the most (resp. least) significant k bits of a bit string z. For CFB decryption (Algorithm A.12), the identical key stream k0, k1, . . . , kl is generated and used to mask off the message blocks from the ciphertext blocks.

Algorithm A.12. CFB decryption

Input: The ciphertext blocks c1, . . . , cl, the key K and the IV.

Output: The plaintext m = m1 . . . ml.

Steps:

k0 := IV.
for i = 1, . . . , l {
   mi := ci ⊕ msbn (fK(ki – 1)).
   ki := lsbnn (ki – 1) ‖ ci.
}

The OFB mode

The output feedback or the OFB mode also works like a stream cipher by masking the plaintext blocks using a stream of keys. The key stream in the OFB mode is generated by successively applying the block encryption function on an n-bit (not necessarily secret) IV. Here, one chooses any .

OFB encryption is explained in Algorithm A.13. OFB decryption (Algorithm A.14) is identical, with only the roles of m and c interchanged, and requires the generation of the same key stream k0, k1, . . . , kl used during encryption.

Algorithm A.13. OFB encryption

Input: The plaintext blocks m1, . . . , ml, the key K and the IV.

Output: The ciphertext c = c1 . . . cl.

Steps:

k0 := IV.      /* Initialize the key stream */
for i = 1, . . . , l {
    ki := fK(ki–1).     /* Generate the next key in the stream */
    ci := mi ⊕ msbn (ki)    . /* Mask the plaintext block */
}

Algorithm A.14. OFB decryption

Input: The ciphertext blocks c1, . . . , cl, the key K and the IV.

Output: The plaintext m = m1 . . . ml.

Steps:

k0 := IV.     /* Initialize the key stream */
for i = 1, . . . , l {
   ki := fK(ki–1).    /* Generate the next key in the stream */
   mi := ci ⊕ msbn (ki).    /* Remove the mask from the ciphertext block */
}

Exercise Set A.2

A.1Let us use the notations of Algorithm A.2. For a message m and round keys Ki, we have the values V, Li, Ri, W, c. For another message m′ and another set of round keys , let us denote these values by V′, , , W′, c′. Show that if m′ = c and if for i = 1, . . . , 16, then and for all i = 0, 1, . . . , 16. Deduce that in this case we have c′ = m. (This shows that DES decryption is the same as DES encryption with the key schedule reversed.)
A.2For a bit string z, let denote the bit-wise complement of z. Deduce that , that is, complementing both the plaintext message and the key complements the ciphertext message. [H]
A.3A DES key K is said to be weak, if the DES key schedule on K gives K1 = K2 = · · · = K16. Show that there are exactly four weak DES keys which in hexadecimal notation are:
0101 0101 0101 0101
FEFE FEFE FEFE FEFE
1F1F 1F1F 0E0E 0E0E
E0E0 E0E0 F1F1 F1F1

A.4A DES key K is said to be anti-palindromic, if the DES key schedule on K gives for all i = 1, . . . , 16. Show that the following four DES keys (in hexadecimal notation) are anti-palindromic:
01FE 01FE 01FE 01FE
FE01 FE01 FE01 FE01
1FE0 1FE0 0EF1 0EF1
E01F E01F F10E F10E

A.5Represent , where f(X) = X8 + X4 + X3 + X + 1 (Section A.2.2).
  1. Show that multiplication by x (the octet 02) in can be computed by a left shift followed conditionally (derive the condition) by XORing with the octet 1b.

  2. Design an algorithm for multiplying two elements of using bit manipulations on octets only.

A.6The multiplication of can be made table-driven. Since this field contains 256 elements, a 256 × 256 array suffices to store all the products. That requires a storage of 64 kb. We can considerably reduce the storage by using discrete logs.
  1. Show that the multiplicative order of x (in ) is 51.

  2. Show that x + 1 is a generator of .

  3. Write a computer program to generate the table of discrete logarithms of elements of to the base x + 1 (Table A.6).

    Table A.6. Discrete-log table for AES
     0123456789abcdef
    000190132021ac64bc71b6833eedf03
    16404e00e348d81ef4c7108c8f8691cc1
    27dc21db5f9b9276a4de4a6729ac90978
    3652f8a05210fe12412f082453593da8e
    4968fdbbd36d0ce94135cd2f140468338
    566ddfd30bf068b62b325e29822889110
    67e6e48c3a3b61e423a6b2854fa853dba
    72b790a159b9f5eca4ed4ace5f373a757
    8af58a850f4ead6744faee9d5e7e6ade8
    92cd7757aeb160bf559cb5fb09ca951a0
    a7f0cf66f17c449ecd8431f2da4767bb7
    bccbb3e5afb60b1863b52a16caa55299d
    c97b2879061bedcfcbc95cfcd373f5bd1
    d5339843c41a26d47142a9e5d56f2d3ab
    e441192d923202e89b47cb8267799e3a5
    f674aeddec531fe180d638c80c0f77007

  4. Write a computer program to generate the table of powers of x + 1 (Table A.7).

    Table A.7. Power table for AES
     0123456789abcdef
    00103050f113355ff1a2e7296a1f81335
    15fe13848d87395a4f702060a1e2266aa
    2e5345ce43759eb266abed97090abe631
    353f5040c143c44cc4fd168b8d36eb2cd
    44cd467a9e03b4dd762a6f10818287888
    5839eb9d06bbddc7f8198b3ce49db769a
    6b5c457f9103050f00b1d2769bbd661a3
    7fe192b7d8792adec2f7193aee92060a0
    8fb163a4ed26db7c25de73256fa153f41
    9c35ee23d47c940c05bed2c749cbfda75
    a9fbad564acef2a7e829dbcdf7a8e8980
    b9bb6c158e82365afea256fb1c843c554
    cfc1f2163a5f407091b2d7799b0cb46ca
    d45cf4ade798b8691a8e33e42c651f30e
    e12365aee297b8d8c8f8a8594a7f20d17
    f394bdd7c8497a2fd1c246cb4c752f601

  5. Design an algorithm for multiplying two elements of using table lookup.

A.7Denote the multiplication of by ⊗ (Section A.2.2).
  1. Let α = a3y3 + a2y2 + a1y + a0 and β = b3y3 + b2y2 + b1y + b0 be elements of A and γ = c3y3 + c2y2 + c1y + c0 = α ⊗ β. Show that

    where the matrix arithmetic on the right side follows the arithmetic of .

  2. Verify that the inverse of the element of A represented by the word 03010102 (in hex) is 0b0d090e.

A.8
  1. Show that Transform (A.3) can be represented as

    where the matrix arithmetic on the right side is that of .

  2. Let denote the 8 × 8 matrix of Part (a). Prove that is invertible over with

  3. Conclude that the transformation A ↦ SubOctet(A) is invertible.

A.9
  1. Argue that the transforms SubState and ShiftRows commute with one another.

  2. Show that MixCols–1(AddKey(S, L0, L1, L2, L3)) = AddKey(MixCols–1(S), MixCols–1(L0, L1, L2, L3)) for a suitable meaning of the application of MixCols–1 on four 32-bit keys L0, L1, L2 and L3.

  3. Conclude that one can obtain a decryption key schedule in such a way that Algorithm A.15 correctly performs AES decryption. [H]

Algorithm A.15. Equivalent form of AES decryption

Input: The ciphertext message C = γ0γ1 . . . γ15 and the decryption key schedule .

Output: Plaintext message M = μ0μ1 . . . μ15.

Steps:

Convert C to the state S.                                 /* Use Transform (A.1) */

for i = Nr – 1, Nr – 2, . . . , 0 {
      S := SubState–1(S).
      S := ShiftRows–1(S).
      if (i ≠ 0) { S := MixCols–1(S). }
      
}
Convert S to the message M.                            /* Use Transform (A.2) */

A.10Show that a multiple encryption scheme with exactly k stages provides an effective security of ⌈k/2⌉ keys against the meet-in-the-middle attack.
A.11Consider a message m broken into blocks m1, . . . , ml, encrypted to c1, . . . , cl and sent to an entity.
  1. Suppose that during the transmission exactly one ciphertext block gets corrupted. Show that for the different modes of encryption, the numbers ν of blocks that are incorrectly decrypted due to this transmission error are as listed in the following table.

    Modeν
    ECB1
    CBC≤ 2
    CFB≤ 1 + ⌈n/n′⌉
    OFB1

  2. For each of the four modes, discuss the effects on decryption caused by the insertion or deletion of a ciphertext block during transmission (say, by an active adversary).

A.3. Stream Ciphers

A block cipher encrypts large blocks of data using a fixed key. A stream cipher, on the other hand, encrypts small blocks of data (typically bits or bytes) using different keys. The security of a stream cipher stems from the unpredictability of guessing the keys in the key stream. Here, we deal with stream ciphers that encrypt bit-by-bit.

Definition A.2.

A stream cipher F encrypts a plaintext m = m1m2 . . . ml to a ciphertext c = c1c2 . . . cl using a key stream k = k1k2 . . . kl, where each mi, ci, . F uses a function that yields f(mi, ki) = ci. In order to effect unique decryption, the map , μ ↦ f(μ, k), must be a bijection for each . F encrypts and decrypts bit-by-bit using the formulas ci = fki(mi) and .

Example A.1.

An obvious choice for fκ is fκ(μ) := μ ⊕ κ, so that . Suppose that the bits k1, k2, . . . , kl in the key stream are generated randomly and uniformly, independent of the plaintext bits. Let us assume that for an the probability Pr(mi = 0) is p, so that Pr(mi = 1) = 1 – p. Since Pr(ki = 0) = Pr(ki = 1) = 1/2, and mi and ki are independent, we have:

Pr(ci = 0)=Pr(mi = 0, ki = 0) + Pr(mi = 1, ki = 1)
 =Pr(mi = 0) Pr(ki = 0) + Pr(mi = 1) Pr(ki = 1)
 =p × (1/2) + (1 – p) × (1/2) = 1/2.

So Pr(ci = 1) is 1/2 too, that is, the two values of ci are equally likely, irrespective of the probability p. This, in turn, implies that the ciphertext bit ci provides absolutely no information about the plaintext bit mi. In this sense, this stream cipher, called Vernam’s one-time pad, offers unconditional security.

Generating a truly random key stream of arbitrary length is a difficult problem. Moreover, the same key stream is used for decryption and has to be reproduced at the recipient’s end. In view of these difficulties, Vernam’s one-time pad is used only very rarely.

A practical solution is to use a pseudorandom key stream k1, k2, k3, . . . generated from a secret key J of fixed small length. The bits in the pseudorandom stream should be sufficiently unpredictable and the length of J adequately large, so as to preclude the possibility of mounting a successful attack in feasible time.

Depending on how the key stream is generated from J, stream ciphers can be broadly classified in two categories. In a synchronous stream cipher, each key in the key stream is generated independent of any plaintext or ciphertext bit, whereas in a self-synchronizing (or asynchronous) stream cipher each key in the stream is generated based only on J and a fixed number of previous ciphertext bits. Algorithms A.16 and A.17 explain the workings of these two classes of stream ciphers.

Algorithm A.16. Encryption in a synchronous stream cipher

Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state S of the key stream generator.

Output: The ciphertext c = c1c2 . . . cl.

Steps:

s0 := S.                             /* Initialize the state of the key stream generator */
for i = 1, . . . , l {
   ki := g(si–1J).               /* Generate the key ki */
   si := δ(si–1J).                /* Transition to the next state */
   ci := fki (mi).                  /* Encrypt the plaintext bit mi */
}

Algorithm A.17. Encryption in an asynchronous stream cipher

Input: The message m = m1m2 . . . ml, the secret key J and a (not necessarily secret) initial state (ct+1, ct+2, . . . , c0).

Output: The ciphertext c = c1c2 . . . cl.

Steps:

for i = 1, . . . , l {
   ki := g(ci–tcit+1, . . . , ci–1J).         /* Generate the key ki */
   ci := fki (mi).                                     /* Encrypt the plaintext bit mi */
}

A block cipher in the OFB mode works like a synchronous stream cipher, whereas a block cipher in the CFB mode like an asynchronous stream cipher.

A.3.1. Linear Feedback Shift Registers

Linear feedback shift registers (LFSRs), being suitable for hardware implementation and possessing good cryptographic properties, are widely used as basic building blocks for many stream ciphers. Figure A.2 depicts an LFSR L with d stages or delay elements D0, D1, . . . , Dd–1, each capable of storing one bit. The state of the LFSR is described by the d-tuple s := (s0, s1, . . . , sd–1), where si is the bit stored in Di. It is often convenient to treat s as the column vector (s0 s1 . . . sd–1)t.

Figure A.2. A linear feedback shift register (LFSR) with d stages


There are d control bits a0, a1, . . . , ad–1. The working of the LFSR is governed by a clock. At every clock pulse the bits stored in the delay elements are bit-wise AND-ed with the respective control bits and the AND gate outputs are XOR-ed to obtain the bit sd. The bit s0 stored in D0 is delivered to the output. Finally, for each the delay element Di sets its stored bit to si+1, that is, the register experiences a right shift by one bit with the feedback bit sd filling up the leftmost delay element.

Thus, a clock pulse changes the state of the LFSR from s := (s0, s1, . . . , sd–1) to t := (t0, t1, . . . , td–1), where s and t are related as:

If s and t are treated as column vectors, this can be compactly represented as

Equation A.4


where the transition matrix ΔL is given by

Equation A.5


When the LFSR L is initialized to a non-zero state, the bit stream output by it can be used as a pseudorandom bit sequence. For a given set of control bits a0, . . . , ad–1, the next state of L is uniquely determined by its previous state only. Since L has only finitely many (2d – 1) non-zero states, the output bit sequence of L must be (eventually) periodic. For cryptographic use, the period of the bit sequence should be as large as possible. If the period is maximum possible, namely 2d – 1, L is called a maximum-length LFSR.

Many properties of the LFSR L can be explained in terms of its connection polynomial defined as:

Equation A.6


For example, assume that a0 = 1, so that deg CL(X) = d. Assume further that CL(X) is irreducible (over ). Consider the extension of , represented as , where . It turns out that if x is a generator of the cyclic group , then L is a maximum-length LFSR. In this case, the polynomial CL(X) is called a primitive polynomial of .[3]

[3] A primitive polynomial defined in this way has nothing to do with a primitive polynomial over a UFD, defined in Exercise 2.54. Mathematicians often go for such multiple definitions of the same terms and phrases.

A.3.2. Stream Ciphers Based on LFSRs

The bit sequence output by an LFSR L can be used as the key stream k1k2 . . . kl in order encrypt a plaintext stream m1m2 . . . ml to the ciphertext stream c1c2 . . . cl with ci := miki. The number d of stages in L should be chosen reasonably large and the control bits a0, . . . , ad–1 should be kept secret. The initial state of L may or may not be a secret. For suitable choices of a0, . . . , ad–1, the output sequences from L possess good statistical properties and hence L appears to be an efficient key stream generator.

Unfortunately, such a key stream generator is vulnerable to a known-plaintext attack as follows. Suppose that mi and ci are known for i = 1, 2, . . . , 2d. One can easily compute ki = mici for all these i. Let si := (ki, ki+1, . . . , ki+d–1) denote the state of L while outputting ci. By Congruence (A.4), si+1 ≡ ΔLsi (mod 2) for i = 1, 2, . . . , d. Define the d × d matrices S := (s1 s2 . . . sd) and T := (s2 s3 . . . sd+1), where si are treated as column vectors as before. We then have T ≡ ΔLS (mod 2). If S is invertible modulo 2, then ΔL and hence the secret control bits can be easily computed. In order to avoid this known-plaintext attack, one should introduce some non-linearity in the LFSR outputs.

A non-linear combination generator combines the output bits u1, u2, . . . , ur from r LFSRs by a non-linear function in order to generate the key . The Geffe generator of Figure A.3 gives a well-known example. It uses the non-linear function , that is, (mod 2).

Figure A.3. The Geffe generator


A non-linear filter generator generates the key as k = ψ(s0, s1, . . . , sd–1), where s0, . . . , sd–1 are the bits stored in the delay elements of a single LFSR and where ψ is a non-linear function.

Several other ad hoc schemes can destroy the linearity of an LFSR’s output. The shrinking generator, for example, uses two LFSRs L1 and L2. Both L1 and L2 are simultaneously clocked. If the output of L1 is 1, the output of L2 goes to the key stream, whereas if the output of L1 is 0, the output of L2 is discarded. The resulting key stream is an irregularly (and non-linearly) decimated subsequence of the output sequence of L2.

The non-linear function ( or ψ) eliminates the chance of mounting the straightforward known-plaintext attack described above. However, for polynomial non-linearities certain algebraic attacks are known, for example, see Courtois and Pieprzyk [67, 66].[4] Solving non-linear polynomial equations is usually more difficult than solving linear equations, but ample care should be taken to avoid accidental encounters with easily solvable systems. Complacency is a word ever excluded from a cryptologer’s world.

[4] Visit the Internet site http://www.cryptosystem.net/ for more papers in related areas.

Exercise Set A.3

A.12For each of the two classes of stream ciphers (Algorithms A.16, A.17) discuss the effects on decryption of
  1. alteration

  2. insertion or deletion

of a ciphertext bit during transmission.

A.13Suppose that the LFSR L of Figure A.4 is initialized to the state (1, 0, 0, 0). Derive the sequence of state transitions of the LFSR, and hence determine the output bit sequence of L. Argue that L is a maximum-length LFSR. Verify (according to the definition) that the connection polynomial CL(X) is primitive.

Figure A.4. An LFSR with four stages


A.14Let ΔL and CL(X) be as in Equations (A.5) and (A.6). Show that:
  1. ΔL is invertible modulo 2 if and only if a0 = 1.

  2. The characteristic polynomial of ΔL (a matrix over ) is XdCL(1/X). [H]

A.15Let L be an LFSR with connection polynomial CL(X). Further let , , denote a power series[5] over . Show that L generates the (infinite) bit sequence s0, s1, s2, . . . if and only if the product CL(X)S(X) modulo 2 is a polynomial of degree < d.

[5] A power series over a ring A is a (formal) expression of the form with each . The set of all such power series is denoted by A[[X]]. For two power series and over A, the sum f + g is defined to be the power series and the product fg is defined as the power series , where . Under these operations A[[X]] is a ring. A polynomial over A can be identified with an element of A[[X]], in which all, but finitely many, coefficients are zero.

A.16Let σ = s0s1 . . . sd–1 ≠ 00 . . . 0 be a bit string of length d ≥ 1. The linear complexity L(σ) of σ is defined to be the length of the shortest LFSR that generates σ as the leftmost part of its output (after it is initialized to a suitable state). Prove that:
  1. L(σ) ≤ d.

  2. L(σ) = d if and only if σ = 00 . . . 01. [H]

A.17Assume that the three LFSR outputs u1, u2, u3 in the Geffe generator are uniformly distributed. Show that Pr(k = u1) = 3/4 = Pr(k = u3). Thus, partial information about the internal details of the Geffe generator is leaked out in the key stream.

A.4. Hash Functions

A hash function maps bit strings of any length to bit strings of a fixed length n. For practical uses, hash functions should be easy to compute, that is, computing the hash of x should be doable in time polynomial in the size of x.

Since a hash function H maps an infinite set to a finite set, there must exist pairs (x1, x2) of distinct strings with H(x1) = H(x2). Such a pair is called a collision for H. For cryptographic applications (for example, for generating digital signatures), it should be computationally infeasible to find collisions for hash functions. To elaborate this topic further we mention the following two desirable properties of hash functions used in cryptography.

Definition A.3.

A hash function H is called second pre-image resistant, if it is computationally infeasible[6] to find, for a given bit string x1, a second bit string x2 with H(x1) = H(x2).

[6] A problem P is said to be computationally infeasible if any known or possible algorithm (deterministic or randomized) to solve P runs in infeasible (like super-polynomial) time, except perhaps for a set of some input instances, the density of which in the input space is zero (or, more generally, negligibly small).

Definition A.4.

A hash function H is called collision resistant, if it is computationally infeasible to find any two distinct bit strings x1 and x2 with H(x1) = H(x2).

In order to prevent existential forgery (Exercise 5.15) of digital signatures, hash functions should also be difficult to invert.

Definition A.5.

An n-bit hash function H is called first pre-image resistant (or simply pre-image resistant), if it is computationally infeasible to find, for almost all bit strings y of length n, a bit string x (of any length) such that y = H(x). The qualification almost all in the last sentence was necessary, since one can compute and store the pairs (xi, H(xi)), i = 1, 2, . . . , k, for some small k and for some xi of one’s choice. If the given y turns out to be one of these hash values H(xi), a pre-image of y is easily available.

A hash function (provably or believably) satisfying all these three properties is called a cryptographic hash function. A hash function having first and second pre-image resistance is often called a one-way hash function. Some authors require both second pre-image resistance and collision resistance to define a collision-resistant hash function, but here we stick to Definitions A.3 and A.4. In what follows, an unqualified use of the phrase hash function indicates a cryptographic hash function.

Most of the properties of a cryptographic hash function are mutually independent. However, we have the following implication.

Proposition A.1.

A collision resistant hash function is second pre-image resistant.

Proof

Let H be a (non-cryptographic) hash function which is not second pre-image resistant. This means that there is an algorithm A that efficiently computes second pre-images, except perhaps for a vanishingly small fraction of inputs. Choose a random bit string x1. The probability that x1 is not a bad input to A is very high and, in that case, A outputs a second pre-image x2 quickly. This gives us an efficient randomized algorithm to compute collisions (x1, x2) for H.

The converse of Proposition A.1 is not true: A second pre-image resistant hash function need not be collision resistant (Exercise A.19). Also collision resistance (or second pre-image resistance) does not imply first pre-image resistance (Exercise A.20), and first pre-image resistance does not imply second pre-image resistance (Exercise A.21).

A hash function may or may not be used in conjunction with a secret key. An unkeyed hash function is typically used to check the integrity of a message and is often called a modification detection code (MDC). A keyed hash function, on the other hand, is usually employed to authenticate the origin of a message (in addition to verifying the integrity of the message) and so is often called a message authentication code (MAC).

A.4.1. Merkle’s Meta Method

Let us now describe a generic method of constructing hash functions. We start by defining the following basic building block.

Definition A.6.

Let m, with m = n + r for some . A function that maps bit strings of length m to bit strings of length n is called a compression function. Henceforth, we will consider only those compression functions that can be computed easily, that is, in polynomial time of the input size.

Since m > n, collisions must exist for F. For cryptographic use, collisions should be difficult to locate. We can define first and second pre-image resistance and collision resistance of compression functions as before.

Algorithm A.18. Merkle’s meta method

Input: A compression function with m = n + r and a bit string x of length < 2r.

Output: The hash value H(x).

Steps:

Let λ be the bit length of x.
Set l := ⌈λ/r⌉.
If (λ is not a multiple of r) { Append rl – λ zero bits to the right of x. }
Break the padded x into blocks x1, . . . , xl each of length r.
Store in a new block xl+1 the r-bit representation of λ.
Initialize h0 := 0r.
for i = 1, 2, . . . , l + 1 { hi := F (hi–1 ‖ xi) }
Set H(x) := hl+1.

Algorithm A.18 demonstrates how a compression function can be used to design an n-bit hash function H. The input message x is first broken into l ≥ 0 blocks each of bit length r, after padding zero bits, if necessary. The initial bit length λ of x is then stored in a new block. This implies that H cannot handle bit strings of length ≥ 2r. For a reasonably big r, this is not a practical limitation. Storing λ is necessary for several reasons. First, it ensures that the for loop is executed at least once for any message. This prevents the trivial hash value 0r (the bit string of length r containing zero bits only) for the null message. Moreover, if hi = 0r for some , then, without the length block, we would get H(x1 ‖ . . . ‖ xl) = H(xi+1 ‖ . . . ‖ xl) that leads to a collision for H.

We now show if F possesses the desired properties for use in cryptography, then so does H too.

Proposition A.2.

If F is first pre-image resistant, then so is H.

Proof

Assume that H is not first pre-image resistant, that is, an efficient algorithm A exists to compute x with H(x) = y for most (if not all) . Since y = hl+1 = F (hlxl+1), a pre-image (namely, hlxl+1) of y under F is easily computable.

Proposition A.3.

If F is collision resistant, then H is collision resistant (and hence also second pre-image resistant).

Proof

Given a collision (x, x′) for H, we can find a collision for F with little additional effort. We use the notations of Algorithm A.18 with primed variables for x′.

First consider ll′. But then, in particular, the length blocks xl+1 and are different and thus is a collision for F. So for the rest of the proof we take l = l′.

Now, suppose that for some . Choose the largest such i and note that hi+1 and are defined and equal for this choice. This gives us the collision for F.

The only case that remains to be treated is for all . Since xx′, there is at least one with . For such an i, the equality implies that is a collision for F.

In order to design cryptographic hash functions, it suffices to design cryptographic compression functions. Block ciphers can be used for that purpose. Let f be a block cipher with block size n and key size r. Take m := n + r and consider the map that sends x = LR with and to the encrypted bit string fR(L). If fR are assumed to be random permutations of , the resulting compression function F possesses the desirable properties.

A.4.2. The Secure Hash Algorithm

Several custom-designed hash functions have been popularly used by the cryptography community. MD4 and MD5 are somewhat older 128-bit hash functions. Soon after its conception, MD4 was found to be vulnerable to several attacks. Also collisions for the compression function of MD5 are known. Therefore, these two hash functions have lost the desired level of confidence for cryptographic uses.

NIST has proposed a family of four hash algorithms. These algorithms are called secure hash algorithms and have the short names SHA-1, SHA-256, SHA-384 and SHA-512, which respectively produce 160-, 256-, 384- and 512-bit hash values. No collisions for SHA are known till date. In the rest of this section, we explain the SHA-1 algorithm. The workings of the other SHA algorithms are very similar and can be found in the FIPS document [222]. RIPEMD-160 is another popular 160-bit hash function.

SHA-1 (like other custom-designed hash functions mentioned above) is suitable for implementation in 32-bit processors. Suppose that we want to compute the hash SHA-1(M) of a message M of bit length λ. First, M is padded to get the bit string M′ := M ‖ 1 ‖ 0k ‖ Λ, where Λ is the 64-bit representation of λ, and where k is the smallest non-negative integer for which the bit length of M′, that is, λ + 1 + k + 64, is a multiple of 512. M′ is broken into blocks M(1), M(2), . . . , M(l) each of length 512 bits. Each M(i) is represented as a collection of sixteen 32-bit words , j = 0, 1, . . . , 15. SHA-1 supports big-endian packing, that is, stores the leftmost 32 bits of M(i), the next 32 bits of the rightmost 32 bits of M(i).

The SHA-1 computations are given in Algorithm A.19. One starts with a fixed initial 160-bit hash H(0). Successively for i = 1, 2, . . . , l the i-th message block M(i) is considered and the previous hash value H(i–1) is updated to H(i). At the end of the loop the 160-bit string H(l) is returned as SHA-1(M). Each H(i) is represented by five 32-bit words , j = 0, 1, 2, 3, 4. Here also, big-endian notation is used, that is, stores the leftmost 32 bits of H(i), . . . , the rightmost 32 bits of H(i).

The updating procedure uses logical functions fj. Here, product (like xy) implies bit-wise AND, bar (as in ) denotes bit-wise complementation and ⊕ denotes bit-wise XOR, each on 32-bit operands. The notation LRk(z) (resp. RRk(z)) stands for a left (resp. right) rotation, that is, a cyclic left (resp. right) shift, of the bit string z of length 32 by k positions.

The bits of H(i) are well-defined transformations of the bits of H(i–1) under the guidance of the bits of M(i). The good amount of non-linearity, introduced by the functions fj and the modulo 232 sums, makes it difficult to invert the transformation H(i–1)H(i) and thereby makes SHA-1 an (apparently) secure hash function.

Algorithm A.19. The SHA-1 algorithm

Input: A message M.

Output: The hash SHA-1(M) of M.

Steps:

Generate the message blocks M(i)i = 1, 2, . . . , l.
/* Initialize the hash value */
H0 := 0x67452301 efcdab89 98badcfe 10325476 c3d2e1f0.
for i = 1, 2, . . . , l {
   /* Compute the message schedule Wj, 0 ≤ j ≤ 79. */
   for 
   for j = 16, 17, . . . , 79 { Wj := LR1(Wj–3 ⊕ Wj–8 ⊕ Wj–14 ⊕ Wj–16) }
   /* Store the previous hash words */
   for 
   /* Compute the updating values */
   for j = 0, 1, . . . , 79 {
      Where
          
          and
          
      t4 := t3t3 := t2t2 := RR2(t1), t1 := t0t0 := T.
   }
   /* Update the hash value */
   for 
}
Set SHA-1(M) := H(l).

A test vector for SHA-1 is the following (here 616263 is the string “abc”):

SHA-1(616263) = a9993e364706816aba3e25717850c26c9cd0d89d.

Exercise Set A.4

A.18Let x be a bit string. Break up x into blocks x1, . . . , xl each of bit size n (after padding, if necessary). Define H1(x) := x1 ⊕ . . . ⊕ xl. Show that H1 possesses none of the desirable properties of a cryptographic hash function.
A.19Let H be an n-bit cryptographic hash function and S a finite set of strings with #S ≥ 2. Define the function . Here, 0n+1 refers to a bit string of length n + 1 containing zero-bits only. Show that H2 is second pre-image resistant, but not collision resistant. [H]
A.20Let H be an n-bit cryptographic hash function. Show that the function H3 defined as is collision resistant (and hence second pre-image resistant), but not first pre-image resistant. [H]
A.21Let m be a product of two (unknown) big primes and let the binary representation of m (with leading one-bit) have n bits. Assume that it is computationally infeasible to compute square roots modulo m. We can identify bit strings with integers in a natural way. For a bit string x, take y := 1 ‖ x and let H4(x) denote the n-bit binary representation of y2 (mod m). Show that H4 is first pre-image resistant, but not second pre-image resistant (and hence not collision-resistant). [H]
A.22Let H be an n-bit cryptographic hash function. Assume that H produces random hash values on random input strings. Prove that O(2n/2) hash values need to be computed to detect a collision for H with high probability. [H] Deduce also that nearly 2n–1 hash values need to be computed on an average to obtain a second pre-image x′ of H(x).
A.23Let be a collision resistant compression function.
  1. Define a compression function as follows. Let x be a bit string of length 4n. Write x = LR, where each of L and R is of length 2n bits. Define F2(x) := F1(F1(L) ‖ F1(R)). Show that F2 is also collision-resistant.

  2. Inductively define as Fk(x) := F1(Fk–1(L) ‖ Fk–1(R)), where L and R are the left and right halves of x. Show that each Fk is collision resistant.

  3. Show that if F1 is first pre-image resistant, then so is each Fk.

  4. Define an n-bit hash function H as follows. Let x be a bit string of length l. If l < n, take k := 1, else choose such that 2k–1nl < 2kn. Construct the string and define H(x) := Fk(y). Is H collision resistant? [H] (Appending a one-bit at the end of x delimits x and thereby prevents trivial collisions.)

A.24
  1. Let and be cryptographic compression functions. Show that defined as F(LR) := F1(L) ‖ F2(R) (where and ) is again a cryptographic compression function.

  2. The hash function H derived from DES (Section A.4.1) produces 64-bit hash values. For reasonable security, we require n-bit hash values with n at least 128. Use Part (a) to propose a method to make H achieve this desired level of security.

A.25Assume that in the SHA-1 algorithm the designers opted for Algorithm A.19 with the following minor modifications: They defined fj as fj(x, y, z) := xyz for all and they replaced all costly mod 232 addition operations (+) by cheap bit-wise XOR operations (⊕). Do you sense anything wrong with this design? [H]

B. Key Exchange in Sensor Networks

B.1Introduction
B.2Security Issues in a Sensor Network
B.3The Basic Bootstrapping Framework
B.4The Basic Random Key Predistribution Scheme
B.5Random Pairwise Scheme
B.6Polynomial-pool-based Key Predistribution
B.7Matrix-based Key Predistribution
B.8Location-aware Key Predistribution

One of the keys to happiness is a bad memory.

—Rita Mae Brown

That theory is worthless. It isn’t even wrong!

—Wolfgang Pauli

You’re only as sick as your secrets.

—Anonymous

B.1. Introduction

Public-key cryptography is not a solution to every security problem. Asymmetric routines are bulky and slow, and, in practice, augment symmetric cryptography by eliminating the need for prior secret establishment of keys between communicating parties. On a workstation of today’s computing technology, this is an interesting and acceptable breakthrough. A 1 GHz processor runs one public-key encryption or key-exchange primitive in tens to hundreds of milliseconds, using at least hundreds of kilobytes of memory. That is reasonable for most applications, given that the routines are invoked rather infrequently.

Now, imagine a situation, where many tiny computing nodes, called sensor nodes, are scattered in an area for the purpose of sensing some data and transmitting the data to nearby base stations for further processing. This transmission is done by short-range radio communications. The base stations are assumed to be computationally well-equipped, but the sensor nodes are resource-starved. Such networks of sensor nodes are used in many important applications including tracking of objects in an enemy’s area for military purposes and scientific, engineering and medical explorations like wildlife monitoring, distributed seismic measurement, pollution tracking, monitoring fire and nuclear power plants and tracking patients. In some cases, mostly for military and medical applications, data collected by sensor nodes need to be encrypted before transmitting to neighbouring nodes and base stations.

Evidently one has to resort to symmetric-key cryptography in order to meet the security needs in a sensor network. Appendix B provides an overview of some key exchange schemes suitable for sensor networks.

B.2. Security Issues in a Sensor Network

Several issues make secure communication in sensor networks different from that in usual networks:

Limited resources in sensor nodes

Each sensor node contains a primitive processor featuring very low computing speed and only small amount of programmable memory. The popular Atmel ATmega 128L processor, as an example, has an 8-bit 4 MHz RISC processor and only 128 kbytes of RAM. The processor does not support instructions for multiplying or dividing integers. One requires tens of minutes to several hours for performing a single RSA or Diffie–Hellman exponentiation for cryptographic key sizes.

Limited lifetime of sensor nodes

Each sensor node is battery-powered and is expected to operate for only a few days. Once deployed sensor nodes die, it becomes necessary to add fresh nodes to the network for continuing the data collection operation. This calls for dynamic management of security objects (like keys).

Limited communication ability of sensor nodes

Sensor nodes communicate with each other and the base stations by wireless radio transmission at low bandwidth and over small communication ranges. For the Atmel ATmega 128L processor, the maximum bandwidth is 40 kbps, and the communication range is at most 100 feet (30 m).

Moreover, the deployment area may have irregularities (like physical obstacles) that further limit the communication abilities of the nodes. One, therefore, expects that a deployed sensor node can directly communicate with only few other nodes in the network.

Possibility of node capture

A sensor network is vulnerable to capture of nodes by the enemy. The captured nodes may be physically destroyed or utilized to send misleading signals and/or disrupt the normal activity of the network. As a result, no node should have full trust on the nodes with which it communicates. The relevant security goal in this context is that the captured nodes should not divulge to the enemy enough secrets to jeopardize the communication among the uncaptured nodes.

Lack of knowledge about deployment configuration

In many situations (like scattering of nodes from airplanes or trucks), the post-deployment configuration of the sensor network is not known a priori. It is unreasonable to use security algorithms that have strong dependence on locations of nodes in the network. For example, each sensor node u is expected to have only a few neighbours with which it can directly communicate. This is precisely the set of nodes with which u needs to share keys. However, the list cannot be determined before the actual deployment. An approximate knowledge of the locations of the nodes may strengthen the protocols, but robustness for handling run-time variations must be built in the protocols.

Mobility of sensor nodes

Sensor nodes may be static or mobile. Mobile nodes change the network configurations (like the lists of neighbours) as functions of time and call for time-varying security tools.

Still, sensor nodes need to communicate secretly. The clear impracticality of using public-key routines forces one to use symmetric ciphers. But setting up symmetric keys among communicating nodes is a difficult task. The number n of nodes in a sensor network can range up to several hundred thousands. Storing a symmetric key for each pair of nodes is impossible, since that requires each sensor to have a memory large enough to store n – 1 keys. On the other extreme, every communication may use a single network-wide symmetric key. In that case the capture of a single node makes communication over the entire network completely insecure.

The plot thickens. There are graceful ways out. A host of algorithms has been recently proposed to address key establishment issues in sensor networks. In the rest of this appendix, we provide a quick survey of these tools. For the sake of simplicity, we assume here that our sensor network is static, that is, the nodes have no (or negligibly small) mobility. Though the schemes described below may be adapted to mobile networks, the required modifications are not necessarily easy and the current literature does not seem to be ready to take mobility into account.

We continue to deal with sensor processors of the capability of Atmel ATmega 128L. In practice, better processors (with speed, storage and cost roughly one order of magnitude higher) are available. We assume that the size (number of nodes) n of a sensor network is (usually) not bigger than a million, and also that a sensor node has of the order of 100 neighbours in its communication range.

B.3. The Basic Bootstrapping Framework

Key establishment in a sensor network is effected by a three-stage process called bootstrapping. Subsequent node-to-node communication uses the keys established during the bootstrapping phase. The three stages of bootstrapping are as follows:

Key predistribution

This step is carried out before the deployment of the sensors. A key set-up server chooses a pool of randomly generated keys and assigns to each sensor node ui a subset of . The set is called the key ring of the node ui. The key predistribution algorithms essentially differ in the ways the sets and are selected. Each key is associated with an ID that need not be kept secret and can even be transmitted in plaintext. Similarly, each sensor node is given a unique ID which need not be maintained secretly.

Direct key establishment

Immediately after deployment, each sensor node tries to determine all other sensor nodes with which it can communicate directly and secretly. Two nodes that are within the communication ranges of one another are called physical neighbours, whereas two nodes sharing one (or more) key(s) in their key rings are called key neighbours. Two nodes can secretly (and directly) communicate with one another if and only if they are both physical and key neighbours; let us plan to call such pairs direct neighbours.

In the direct key establishment phase, each sensor node u locates its direct neighbours. To that end u broadcasts its own ID and the IDs of the keys in its key ring. Each physical neighbour v of u responds by mentioning the matching key IDs, if any, stored in the key ring of v. This is how u identifies its direct neighbours.

If sending unencrypted key IDs can be a potential threat to the security of the network, each node u can encrypt some plaintext message m by the keys in its ring and broadcasts the corresponding ciphertexts instead of the key IDs. Those physical neighbours of u that can decrypt one of the transmitted ciphertexts using one of the keys in their respective key rings establish themselves as direct neighbours of u.

Path key establishment

This is an optional stage and, if executed, adds to the connectivity of the network. Suppose that two physical neighbours u and v fail to establish a direct link between them in the direct key establishment phase. But there exists a path u = u0, u1, u2, . . . , uh–1, uh = v in the network with each ui a direct neighbour of ui+1 (for i = 0, 1, . . . , h – 1). The node u then generates a random key k, encrypts k with the key shared between u and u1 and sends the encrypted key to u1. Subsequently, u1 retrieves k by decryption, encrypts k by the key shared by u1 and u2 and sends this encrypted version of k to u2. This process is repeated until the key k reaches the desired destination v. Now, u and v can communicate secretly and directly using k and thereby become direct neighbours.

The main difficulty in this process is the discovery of a path between u and v. This can be achieved by u initiating a message reflecting its desire to communicate with v. Let u1 be a direct neighbour of u. If u1 is also a direct neighbour of v, a path between u and v is discovered. Else u1 retransmits u’s request to the direct neighbours u2 of u1. This process is repeated, until a path is established between u and v, or the number of hops exceeds a certain limit. Note that path discovery may incur substantial communication overhead and so the maximum number h of hops allowed needs to be fixed at a not-so-big value. Typically, the values h = 2, 3 are recommended.

A bootstrapping algorithm, or more precisely, a key predistribution algorithm must fulfill the following requirements. These requirements often turn out to be mutually contradictory. A key predistribution scheme attempts to achieve suitable trade-offs among them.

Compactness

Each key ring should be small enough to fit in a sensor node’s memory. Typically 50–200 cryptographic keys (say, 128-bit keys of block ciphers) can be stored in each processor. That number is between n – 1 (a key for each pair) and 1 (a master key for the entire network).

Randomness

The key rings in different nodes are to be randomly chosen from a big pool, so that there is not enough overlap between the rings of two nodes.

Network connectivity

The resulting network should be connected in the sense that the undirected graph G = (V, E) with V comprising the nodes in the network and E containing a link (u, v) if and only if u and v are direct neighbours, must be (or at least with high probability) connected.

Resilience against node capture

Ideally, the capture of any number of nodes must not divulge the secret key(s) between uncaptured direct neighbours. Practically, the fraction of communication links among uncaptured nodes, that are compromised because of node captures, must be small, at least as long as the fraction of nodes that are captured is not too high.

Scalability

Arbitrarily (but not impractically) big networks should be supported.

Future addition of nodes

One should allow new nodes to join the network at any point of time after the initial deployment, for example, to replenish captured, faulty and dead nodes.

Additional requirements may also be conceived of in order to take curative measures against active attacks and/or faults. However, a study of active attacks and of countermeasures against those is beyond the scope of our treatment here.

Detection of bad nodes

There should be a mechanism to detect the presence and identities of dead, malfunctioning and rogue nodes. Here, a rogue node stands for a captured node that is used by the enemy to disrupt the natural working of the network. Active attacks mountable by the enemy include transmission of unauthorized and misleading data across the network, making neighbours always busy and letting them run out of battery sooner than the expected lifetime (sleep deprivation attack), and so on.

Revocation of bad nodes

Faulty and rogue nodes must be pruned out of the network before they can cause sizeable harm.

Resilience against node replication

Captured nodes can be replicated and the copies deployed by the enemy with the intention that these added nodes outnumber the legitimate nodes and eventually take control of the network. There should be a strategy to detect and cure replication of malicious nodes.

We now concentrate on some concrete realizations of the bootstrapping scheme. The optional third stage (path key establishment) will often be excluded from our discussion, because there are little algorithm-specific issues in this stage.

Before we introduce specific algorithms, let us summarize the notations we are going to use in the rest of this chapter:

n= Number of nodes in the sensor network
n= (Expected) number of nodes in the physical neighbourhood of each node
d= Degree of connectivity of each node in the key/direct neighbourhood graph
Pc= Global connectivity (a high probability like 0.9999)
p= Local connectivity (probability that two physical neighbours share a key)
M= Size of the key pool
m= The size of key ring of each node (in number of cryptographic keys)
= The underlying field for the poly-pool and the matrix-pool schemes
S= Size of the polynomial (or matrix) pool
s= Number of polynomial (or matrix) shares in the key ring of each node
t= Degree of a polynomial (or dimension of a matrix)
c= Number of nodes captured
Pe= Probability of successful eavesdropping expressed as a function of c

B.4. The Basic Random Key Predistribution Scheme

The paper [88] by Eschenauer and Gligor is a pioneering research on bootstrapping in sensor networks. Their scheme, henceforth referred to as the EG scheme, is essentially the basic bootstrapping method just described.

The key set-up server starts with a pool of randomly generated keys. The number M of keys in is taken to be a small multiple of the network size n. For each sensor node u to be deployed, a random subset of m keys from is selected and given to u as its key ring. Upon deployment, each node discovers its direct neighbours as specified in the generic description. We now explain how the parameters M, m are to be chosen so as to make the resulting network connected with high probability.

Let us first look at the key neighbourhood graph Gkey on the n sensor nodes in which a link exists between two nodes if and only if these nodes are key neighbours. Let p denote the probability that a link exists between two randomly selected nodes of this graph. A result on random graphs due to Erdös and Rényi indicates that in the limit n → ∞, the probability that Gkey is connected is

Equation B.1


We fix Pc at a high value, say, 0.9999, and express the expected degree of each node in Gkey as

Equation B.2


In practice, we should also bring physical neighbourhood in consideration and look at the direct neighbourhood graph G = Gdirect on the n deployed sensor nodes. In this graph, two nodes are connected by an edge if and only if they are direct neighbours. G is not random, since it depends on the geographical distribution of the nodes in the deployment area. However, we assume that the above result for random graphs continues to hold for G too. In particular, we fix the degree of direct connectivity of each node to be (at least) d and require

Equation B.3


where n′ denotes the expected number of physical neighbours of each node, and where p′ is the probability that two physical neighbours share one or more keys in their key rings and . (Pc is often called the global connectivity and p′ the local connectivity.)

For the determination of p′, we first note that there is a total of key rings of size m that can be chosen from the pool of size M. For a fixed , the total number of ways of choosing such that does not share a key with Ki is equal to the number of ways of choosing m keys from . This number is . It then follows that

Equation B.4


Equations (B.2), (B.3) and (B.4) dictate how the key-pool size M is to be chosen, given the values of n, n′ and m.

Example B.1.

As a specific numerical example, consider a sensor network with n = 10,000 nodes. For the desired probability Pc = 0.9999 of connectedness of Gkey, we use Equation (B.2) to obtain the desired degree d as d ≥ 18.419. Let us take d = 20. Now, suppose that the expected number of physical neighbours of each deployed node is n′ = 50. By Equation (B.3), we then require p′ = d/n′ = 0.4. Finally, assume that each sensor can hold m = 150 keys in its memory. Equation (B.4) indicates that we should have M ≤ 44,195 in order to ensure p′ ≥ 0.4. In particular, we may take M = 40,000.

Let us now study the resilience of the EG scheme against node captures. Assume that c nodes are captured at random from the network and that u and v are two uncaptured nodes that are direct neighbours. We compute the probability Pe that an eavesdropper can decipher encrypted communication between u and v based on the knowledge of the keys available from the c captured key rings. Clearly, smaller values of Pe indicate higher resilience against node captures.

Suppose that u and v use the key k for communication between them. Then, Pe is equal to the probability that k resides in one of the key rings of c captured nodes. Since each key ring consists of m keys randomly chosen from a pool of M keys, the probability that a particular key k is not available in a key ring is and consequently the probability that k does not appear in all of the c compromised key rings is . Thus, the probability of successful eavesdropping is

Example B.2.

As in Example B.1, take n = 10,000, n′ = 50, m = 150 and M = 40,000. If c = 100 nodes are captured, the fraction of compromised communication is Pe ≈ 0.313. Thus, a capture of only 100 nodes leads to a compromise of about one-third of the traffic. That is not a satisfactory figure. We need better algorithms.

B.4.1. The q-composite Scheme

Chan et al. [44] propose several modifications of the basic EG scheme in order to improve upon the resilience of the network against node capture. The q-composite scheme, henceforth abbreviated as the qC scheme, is based on the requirement of a bigger overlap of key rings for enabling nodes to communicate.

As in the EG scheme, the key set-up server decides a pool of M random keys and loads the key ring of each node with a random subset of size m of . Let the network consist of n nodes.

In the direct key establishment phase, each node u discovers all its physical neighbours that share q or more keys with u, where q is a predetermined system-wide parameter. Those physical neighbours that do so are now called direct neighbours of u. Let v be a direct neighbour of u and let q′ ≥ q be the actual number of keys shared by u and v. Call these keys k1, k2, . . . , kq. The nodes use the key

k := H(k1k2‖ · · · ‖kq)

for future communication, where ‖ denotes string concatenation and H is a hash function. A pair of physical neighbours that share < q predistributed keys do not communicate directly.

Recall that for the basic EG scheme q = 1 and the key k for communication between direct neighbours is taken to be one shared key instead of a hash value of all shared keys. The motivation behind going for the qC scheme is that requiring a bigger overlap between the key rings of a pair of physical neighbours leads to a smaller probability Pe of successful eavesdropping, since now the eavesdropper has to possess the knowledge of at least q shared keys (not just one). However, the requirement of q (or more) matching keys between communicating nodes restricts the key pool size M more than the EG scheme, and consequently a capture of fewer nodes reveals a bigger fraction of the total key pool to the eavesdropper. Chan et al. [44] report that the best trade-off is achieved for the value q = 2 or 3.

Let us now derive the explicit expressions for M and Pe. Equations (B.1), (B.2) and (B.3) hold for the qC scheme with the sole exception that now the interpretation of the probability p′ of direct neighbourhood is different. There is a total of ways of choosing two random key rings of size m from a pool of M keys. Let us compute the number of such pairs of key rings sharing exactly r keys. First, these shared r keys can be chosen in ways. Out of the remaining Mr keys, the remaining mr keys for the first ring can be chosen in ways. Finally, the remaining mr keys for the second ring can be chosen in ways from Mm keys not present in the first ring. Thus, we have

that is,

is the equivalent of Equation (B.4) for the qC scheme.

Example B.3.

As in Example B.1, consider n = 10,000, n′ = 50, m = 150. For d = 20, we require p′ ≥ 0.4. This, in turn, demands M ≤ 16,387 for q = 2 and M ≤ 9,864 for q = 3. Compare these with the requirement M ≤ 44,195 for the EG scheme.

Let us now calculate the probability Pe of successfully deciphering the communication between two uncaptured nodes u and v, given that c nodes are already captured by the eavesdropper. Let q′ ≥ q be the actual number of keys shared by u and v; this happens with probability . Each of these common keys is available to the eavesdropper with a probability . It follows that

Example B.4.

Let us continue with the network of Examples B.1, B.2 and B.3. The following table summarizes the probabilities Pe for various values of c. For the EG scheme, we take M = 40,000, whereas for the qC scheme, we take M = 16,000 for q = 2 and M = 9,800 for q = 3.

 Pe
Schemec = 10c = 20c = 30c = 40c = 50c = 75c = 100c = 150
EG0.0370.0720.1070.1400.1710.2460.3130.431
2C0.0050.0190.0410.0680.1010.1960.3000.499
3C0.0020.0110.0320.0660.1110.2550.4130.678

This table indicates that when the number of nodes captured is small, the qC scheme outperforms the EG scheme. However, for large values of c, the effects of smaller values of the key-pool size show up, leading to a poorer performance of the qC schemes compared to the EG scheme.

B.4.2. Multi-path Key Reinforcement

Another way to improve the resilience of the network against node captures is the multi-path key reinforcement scheme proposed again by Chan et al. [44]. As in the EG scheme, sensor nodes are deployed each with m keys in its key ring chosen randomly from a pool of M keys. Let u and v establish themselves as direct neighbours sharing the key k. Instead of using k itself as the key for future communication, the nodes try to locate several pairwise node-disjoint paths between them. Such a path u = v0, v1, . . . , vl = v consists of pairs of direct neighbours (vi, vi+1) for i = 0, . . . , l – 1. A randomly generated key k′ is then routed securely along the path from u to v.

Assume that r node-disjoint paths between u and v are discovered and the random keys are transfered securely along these paths. The nodes u and v then use the key

for future communication.

The reason why this scheme improves resilience against node captures is that even if the original k resides in the memory of a captured node, the new key k′ is computable by the adversary if and only if she can obtain all of the r session secrets . The bigger r is, the more difficult it is for the adversary to eavesdrop on all of the r node-disjoint paths. On the other hand, if the lengths of these paths are large, then the probability of eavesdropping at some links of the paths increases. Moreover, increasing the lengths of the paths incurs bigger communication overhead. The proponents of the scheme recommend only 2-hop multi-path key reinforcement.

We do not go into the details of the analysis of the multi-path key reinforcement scheme, but refer the reader to Chan et al. [44]. We only note that though it is possible to use multi-path key reinforcement for the q-composite scheme, it is not a lucrative option. The smaller size of the key pool for the q-composite scheme tends to nullify the effects of multi-path key reinforcement.

B.5. Random Pairwise Scheme

A pairwise key predistribution scheme offers perfect resilience against node captures, that is, the capture of any number c of nodes does not reveal any information about the secrets used by uncaptured nodes. This corresponds to Pe = 0 irrespective of c. This desirable property of the network is achieved by giving each key to the key rings of only two nodes. Moreover, the sharing of a key k between two unique nodes u and v implies that these nodes can authenticate themselves to one another — no other node possesses k and can prove itself as u to v or as v to u.

Pairwise keys can be distributed to nodes in many ways. Now, we deal with random distribution. Let m denote the size of the key ring of each sensor node. For each node u in the network, the key set-up server randomly selects m other nodes v1, . . . , vm and distributes a new random key ki to each of the pairs (u, vi) for i = 1, . . . , m. This distribution mechanism should also ensure that two nodes u, v in the network share at most one key. If k is given to u and v, the set-up server also attaches the ID of v to the copy of k in the key ring of u and the ID of u to the copy of k in the key ring of v.

In the direct key establishment phase, each node u broadcasts its own ID. Each physical neighbour v of u, that finds the ID of u stored against a key in the key ring of v, identifies u as its direct neighbour and also the unique key shared by u and v.

The analysis of the random pairwise scheme is a bit tricky. Here, the global connectivity graph Gkey is m-regular, that is, each node has degree exactly m and we cannot expect to maintain this degree locally too. On the other hand, it is reasonable to assume under a random deployment model that the fraction of nodes with which a given node shares pairwise keys remains the same both locally and globally. More precisely, we equate p′ with p, that is,

Equation B.5


Here, d denotes the desired local degree of a node. Equation (B.2) gives the formula for d in terms of the global connectivity Pc. For Pc = 0.9999, we have d = 16.11 for n = 1,000, d = 18.42 for n = 10,000, d = 20.72 for n = 100,000, and d = 23.03 for n = 1,000,000. That is, the value of d does not depend heavily on n, as long as n ranges over practical values. In particular, one may fix d = 20 (or d = 25 more conservatively) for all applications.

Equation (B.5) implies

This equation reflects the drawback of the random pairwise scheme. The value m is limited by the memory of a sensor node, whereas n′ is dictated by the density of nodes in the deployment area and d can be taken as a constant, and so the network size n is bounded above by the quantity called the maximum supportable network size. The basic scheme (and its variants) support networks of arbitrarily large sizes, whereas the random pairwise scheme has only limited supports.

Example B.5.

Take m = 150, n′ = 50 and d = 20. The maximum supportable network size is then . This is too small to be useful. We require modifications of the random pairwise scheme in order to be able to use it in practice.

B.5.1. Multi-hop Range Extension

Since m and d are limited by hard constraints, the only way to increase the maximum supportable network size is to increase the effective size n′ of the physical neighbourhood of a node. The multi-hop range extension strategy accomplishes that. In the direct key establishment phase, each node u broadcasts its ID. Each physical neighbour v of u re-broadcasts the ID of u. Each physical neighbour w of v then re-re-broadcasts the ID of u. This process is continued for a predetermined number r of hops. Any node u′ reachable from u in ≤ r hops and sharing a pairwise key with u can now establish a path of secure communication with u. During a future communication between u and u′, the intermediate nodes in the path simply forward a message encrypted by the pairwise key between u and u′. Using r hops thereby increases the effective radius of physical neighbourhood by a factor of r, and consequently the number of effective neighbours of each node gets multiplied by a factor of r2. Thus, the maximum supportable network size now becomes

For r = 3 and for the parameters of Example B.5, this size now attains a more decent value of 3375.

Increasing r incurs some cost. First, the communication overhead increases quadratically with r. Second, since intermediate nodes in a multi-hop path simply retransmit messages without authentication, chances of specific active attacks at these nodes increase. Large values of r are, therefore, discouraged.

B.6. Polynomial-pool-based Key Predistribution

Liu and Ning’s polynomial-pool-based key predistribution scheme (abbreviated as the poly-pool scheme) [181, 183] is based on the idea presented by Blundo et al. [28]. Let be a finite field with q just large enough to accommodate a symmetric encryption key. For a 128-bit block cipher, one may take q to be smallest prime larger than 2128 (prime field) or 2128 itself (extension field of characteristic 2). Let be a bivariate polynomial that is assumed to be symmetric, that is, f(X, Y) = f(Y, X). Let t be the degree of f in each of X and Y. A polynomial share of f is a univariate polynomial f(α)(X) := f(X, α) for some element . Two shares f(α) and f(β) of the same polynomial f satisfy

Equation B.6


Thus, if the shares f(α), f(β) are given to two nodes, they can come up with the common value as a shared secret between them.

Given t + 1 or more shares of f, one can reconstruct f(X, Y) uniquely using Lagrange’s interpolation formula (Exercise 2.53). On the other hand, if only t or less shares are available, there are many (at least q) possibilities for f and it is impossible to determine f uniquely. So the disclosure of up to t shares does not reveal the polynomial f to an adversary and uncompromised shared keys based on f remain secure.

Using a single polynomial for the entire network is not a good proposal, since t is limited by memory constraints in a sensor node. In order to increase resilience against node captures, many bivariate polynomials need to be used, and shares of random subsets of this polynomial pool are assigned to the key rings of individual nodes. This is how the poly-pool scheme works. If the degree t equals 0, this scheme degenerates to the EG scheme.

The key set-up server first selects a random pool of S symmetric bivariate polynomials in each of degree t in X and Y. Some IDs are also generated for the nodes in the network. For each node u in the network, s polynomials f1, f2, . . . , fs are randomly picked up from and the polynomial shares f1(X, α), f2(X, α), . . . , fs(X, α) are loaded in the key ring of u, where α is the ID of u. Each key ring now requires space for storing s(t + 1) log q bits, that is, for storing m := s(t + 1) symmetric keys.

Upon deployment, each node u broadcasts the IDs of the polynomials, the shares of which reside in its key ring. Each physical neighbour v of u, that has shares of some common polynomial(s), establishes itself as a direct neighbour of u. The exact pairwise key k between u and v is then calculated using Equation (B.6). If broadcasting polynomial IDs in plaintext is too unsafe, each node u can send some message encrypted by potential pairwise keys based on its polynomial shares. Those physical neighbours that can decrypt one of these encrypted messages have shares of common polynomials.

Like the EG scheme, the poly-pool scheme can be analysed under the framework of random graphs. Equations (B.1), (B.2) and (B.3) continue to hold under the poly-pool scheme. However, in this case the local connection probability p′ is computed as

Equation B.7


Given constraints on the network and the nodes, the desired size S of the polynomial pool can be determined from this formula.

Let us now compute the probability Pe of compromise of communication between two uncaptured nodes u, v as a function of the number c of captured nodes. If ct, the eavesdropper cannot gather enough polynomial shares to learn about any polynomial in , that is, Pe = 0. So assume that c > t and let pr denote the probability that exactly r shares of a given polynomial f (say, the one whose shares are used by the two uncaptured nodes u, v) are available in the key rings of the c captured nodes. The probability that a share of f is present in a key ring is and so (by the Bernoulli distribution)

Equation B.8


Since t + 1 or more shares of f are required for the determination of f, we have

Equation B.9


Example B.6.

Let n = 10,000 (network size), n′ = 50 (expected size of physical neighbourhood of a node), m = 150 (key ring size in number of symmetric keys) and Pc = 0.9999 (global connectivity). Let us plan to choose bivariate polynomials of degree t = 49, so that each key ring can hold s = 3 polynomial shares.

For the determination of S, we first compute d = 20 as in Example B.1. We then require . The biggest size S satisfying this bound is derived from Equation (B.7) as S = 20.

The following table lists the probability Pe for various values of c.

c50100150200250300350400
Pe6.38×10–422.30×10–161.70×10–81.52×10–40.01960.2310.6680.932

The table shows substantial improvement in resilience against node capture as achieved by the poly-pool scheme over the EG and qC schemes.

B.6.1. Pairwise Key Predistribution

The poly-pool scheme can be made pairwise by allowing no more than t + 1 shares of any polynomial to be distributed among the nodes. The best that the adversary can achieve is a capture of nodes with all these t + 1 shares and a subsequent determination of the corresponding bivariate polynomial. But this knowledge does not help the adversary, since no other node in the network uses a share of this compromised polynomial. That is, two uncaptured nodes continue to communicate with perfect secrecy.

However, like the random pairwise scheme, the pairwise poly-pool scheme suffers from the drawback that the maximum supportable network size is now limited by the quantity . For the parameters of Example B.6, this size turns out to be an impractically low 333.

B.6.2. Grid-based Key Predistribution

The grid-based key predistribution considerably enhances the resilience of the network against node captures. To start with, let us play a bit with Example B.6.

Example B.7.

Take n = 10,000, n′ = 50 and m = 150. We calculated that the optimal value of S that keeps the network connected with high probability is S = 20. Now, let us instead take a much bigger value of S, say, S = 200. First, let us look at the brighter side of this choice. The probability Pe is listed in the following table as a function of c.

c5001000150020002500300035004000
Pe1.90×10–254.88×10–133.10×10–74.68×10–40.02820.2450.6550.917

That is a dramatic improvement in the resilience figures. It, however, comes at a cost. The optimal value S = 20 was selected in Example B.6 in order to achieve a desired connectivity in the network. With S = 200, the probability p′ reduces from 0.404 to , and each node is expected to have only about 2 direct neighbours. As a result, the network is likely to remain disconnected with high probability.

The grid-based key predistribution allocates polynomial shares cleverly to the nodes so as to achieve resilience figures of the last example with a reasonable guarantee that the resulting network remains connected. Let n be the size of the network and take . For the sake of simplicity, let us assume that n = σ2. The n nodes are then placed on a σ × σ square grid. The node at the (i, j)-th grid location (where i, ) is identified by the pair (i, j). The set-up server generates 2σ random symmetric bivariate polynomials , each of degree t in both X and Y. The i-th polynomial corresponds to the i-th row and the j-th polynomial to the j-th column in the grid. The key ring of the node at location (i, j) in the grid is given the two polynomial shares and . The memory required for this is equivalent to the storage for 2(t + 1) symmetric keys.

Now, look at the key establishment phase. Let two nodes u, v with IDs (i, j) and (i′, j′) be physical neighbours after deployment. First, consider the simple case i = i′. Both the nodes have shares of the polynomial and can arrive at the common secret value using the column identities of one another. Similarly, if j = j′, the nodes can compute the shared secret . It follows that each node can establish keys directly with 2(σ – 1) other nodes in the network. That’s, however, a truly small fraction of the entire network.

Assume now that ii′ and jj′. If the node w with identity either (i, j′) or (i′, j) is in the physical neighbourhood of both u and v, then there is a secure link between u and w, and also one between w and v. The nodes u and v can then establish a path key via the intermediate node w.

So suppose also that neither (i, j′) nor (i′, j) resides in the communication ranges of both u and v. Consider the nodes w1 := (i, k) and w2 := (i′, k) for some . Suppose further that w1 is in the physical neighbourhood of u, w2 in that of w1 and v in that of w2. But then there is a secure u, v-path comprising the links uw1, w1w2 and w2v. Similarly, the nodes (k, j) and (k, j′) for each ki, i′ can help u and v establish a path key. To sum up, there are 2(σ – 2) potential three-hop paths between u and v.

If all these three-hop paths fail, one may go for four-hop, five-hop, . . . paths, but at the cost of increased communication overhead. As argued in Liu and Ning [181, 183], exploring paths with ≤ 3 hops is expected to give the network high connectivity.

For the grid-based scheme, we have S = 2σ (the key pool size) and s = 2 (the number of polynomial shares in each node’s key ring). Thus, the probability Pe can now be derived like Equations (B.8) and (B.9) as

Pe = 1 – (p0 + p1 + · · · + pt) = pt+1 + pt+2 + · · · + pc,

where

Example B.8.

Take n = 10,000 and m = 150. Since each node has to store only two polynomial shares, we now take t = 74. Moreover, σ = 100, that is, the size of the polynomial pool is S = 200. The probability Pe can now be tabulated as a function of c (number of nodes captured) as follows:

c1000200030004000500060007000
Pe2.45×10–401.99×10–212.68×10–124.35×10–75.41×10–40.03340.290

This is a very pretty performance. The capture of even 60 per cent of the nodes leads to a compromise of only 3.34 per cent of the communication among uncaptured nodes.

This robustness of the grid-based distribution comes at a cost, though. The path key establishment stage is communication-intensive and is mandatory for ensuring good connectivity. Moreover, this stage is based on the assumption that during bootstrapping not many nodes are captured. If this assumption cannot necessarily be enforced, the scheme forfeits much of its expected resilience guarantees.

B.7. Matrix-based Key Predistribution

The matrix-based key predistribution scheme is derived from the idea proposed by Blom [25]. It is similar to the polynomial-based key predistribution and employs symmetric matrices (in place of symmetric polynomials). Let be a finite field with q just large enough to accommodate a symmetric key and let G be a t × n matrix over , where t is determined by the memory of a sensor node and n is the number of nodes in the network. It is not required to preserve G with secrecy. Anybody, even the enemies, may know G. We only require G to have rank t, that is, any t columns of G must be linearly independent. If g is a primitive element of , the following matrix is recommended.

Equation B.10


In a memory-starved environment, this G has a compact representation, since its j-th column is uniquely identified by the value gj. The remaining elements in the column can be easily computed by performing few multiplications.

Let D be a secret t × t symmetric matrix, and A the n × t matrix defined by:

A := (DG)t = GtDt = GtD.

Finally, define the n × n matrix

K := AG.

It follows that K = AG = Gt DG = Gt (Gt Dt)t = Gt (Gt D)t = Gt At = (AG)t = Kt, that is, K is a symmetric matrix. If the (i, j)-th element of K is denoted by kij, we have kij = kji, that is, this common value can be used as a pairwise key between the i-th and j-th nodes.

Let the (i, j)-th element of A be denoted by aij for 1 ≤ in and 1 ≤ jt. Also let gij, 1 ≤ it and 1 ≤ jn, denote the (i, j)-th element of G. But then the pairwise key kij = kji is expressed as:

Thus, the i-th row of A and the j-th column of G suffice for the i-th node to compute kij. Similarly, the j-th row of A and the i-th column of G allow the j-th node to compute kji. In view of this, every node, say, the i-th node, is required to store the i-th row of A and the i-th column of G. If G is as in Equation (B.10), only gi needs to be stored instead of the full i-th column of G. Thus, the storage of t + 1 elements of (equivalent to t + 1 symmetric keys) suffices.

During direct key establishment, two physical neighbours exchange their respective columns of G for the computation of the common key. Since G is allowed to be a public knowledge, this communication does not reveal secret information to the adversary.

Suppose that the adversary gains knowledge of some t′ ≥ t rows of A (say, by capturing nodes). We also assume that the matrix G is completely known to the adversary. The adversary picks up any t known rows of A and constructs a t × t matrix A′ comprising these rows. But then A′ = GD, where G′ is a suitable t × t submatrix of G. Since G is assumed to be of rank t, G′ is invertible and so the secret matrix D can be easily computed. Conversely, if D is known to the adversary, she can compute A and, in particular, any t′ ≥ t rows of A.

If only t′ < t rows are known to the adversary, then any choice of any tt′unknown rows of A yields a value of the matrix D, and subsequently we can construct the remaining nt unknown rows of A. In other words, D cannot be uniquely recovered from a knowledge of less than t rows of A. This task is difficult too, since there is an infeasible number of choices for assigning values to the elements of the unknown tt′rows of A.

To sum up, the matrix-based key predistribution scheme is completely secure, if less than t nodes are only captured. On the other hand, if t or more nodes are captured, then the system is completely compromised. Thus, the resilience against node capture of this scheme is determined solely by t and is independent of the size n of the network. The parameter t, in turn, is restricted by the memory of a sensor node (a node has to store t + 1 elements of ).

In order to overcome this difficulty, Du et al. [79] propose a matrix-pool-based scheme. Here, S matrices A1, A2, . . . , AS are computed from S pairwise different secret matrices D1, D2, . . . , DS. The same G may be used for all these key spaces. Each node is given shares (that is, rows) of s matrices randomly chosen from the pool {A1, A2, . . . , AS}. The resulting details of the matrix-pool-based scheme are quite analogous to those pertaining to the polynomial-pool-based scheme described in the earlier section, and are omitted here.

B.8. Location-aware Key Predistribution

The key predistribution algorithms discussed so far are based on a random deployment model. In practice, the deployment model (like the expected location of each node and the overall geometry of the deployment area) may be known a priori. This knowledge can be effectively exploited to tune the key predistribution algorithms so as to achieve better connectivity and higher resilience against node capture. As an example, consider sensor nodes deployed from airplanes in groups or scattered uniformly from trucks. Since the approximate tracks of these vehicles are planned a priori, the key rings of the nodes can be loaded appropriately to achieve the expected performance enhancements.

Two nodes that are in the physical neighbourhoods of one another need only share a pairwise key. Therefore, the basic objective of designing location-aware schemes is to predistribute keys in such a way that two nodes that are expected to remain close in the deployment area are given common pairwise keys, whereas two nodes that are expected to be far away after deployment need not share any pairwise key. The actual deployment locations of the nodes cannot usually be predicted accurately. Nonetheless, an approximate knowledge of the locations can boost the performance of the network considerably. The smaller the errors between the expected and actual locations of the nodes are, the better a location-aware scheme is expected to perform.

B.8.1. Closest Pairwise Keys Scheme

Liu and Ning [182] propose a modification of the random pairwise key scheme (Section B.5) based on deployment knowledge. Let there be n sensor nodes in the network with each node capable of storing m cryptographic keys. The expected deployment location of each node is provided to the key set-up server. For each node u in the network, the server determines m other nodes whose expected locations of deployment are closest to that of u and for which pairwise keys with u have not already been established. For every such node v, a new random key kuv is generated. The key-plus-ID combination (kuv, v) is loaded in u’s key ring, whereas the pair (kuv, u) is loaded in v’s key ring.

This natural and simple-minded strategy provides complete security against node captures, as it is a pairwise key distribution scheme. Now, there is no limitation on the maximum supportable network size (under the reasonable assumption that there are much less than 2l nodes in the network, where l is the bit length of a cryptographic key, say, 64 or 128). Moreover, the incorporation of deployment knowledge increases the connectivity of the network. In order to analyse this gain, we first introduce some formal notations.

For the sake of simplicity, we assume that the deployment region is two-dimensional, so that every point in that region is expressed by two coordinates x and y. Let u be a sensor node whose expected deployment location is (ux, uy) and whose actual deployment location is . This corresponds to a deployment error of . The actual location (or equivalently the error eu) is modelled as a continuous random variable that can assume values in . The probability density function fu of characterizes the pattern of deployment error. One possibility is to assume that is uniformly distributed within a circle with centre at (ux, uy) and of radius ∊ called the maximum deployment error. We then have:

Equation B.11


An arguably more realistic strategy is to model as a random variable following the two-dimensional normal (Gaussian) distribution with mean (ux, uy) and variance σ2. The corresponding density function is:

Let u and v be two deployed nodes. We assume that each node has a communication range of ρ. We also make the simplifying assumption that the different nodes are deployed independently, that is, and are independent random variables. The probability that u and v lie in the communication ranges of one another can be expressed as a function of the expected locations (ux, uy) and (vx, vy) as:

Here, the integral is over the region C of defined by .

Let n′ denote the number of physical neighbours of u (or of any sensor node). We know that u shares pairwise keys with exactly m nodes. We assume that these key neighbours of u are distributed uniformly in a circle centred at u and of radius ρ′. The expected value of ρ′ is:

Let v be a key neighbour of u. The probability that v lies in the physical neighbourhood of u is given by

where C′ is the region (vxux)2 + (vyuy)2 ≤ ρ′2. Therefore, u is expected to have m × p(u) direct neighbours. Since the size of the physical neighbourhood of u is n′, the local connectivity, that is, the probability that u can establish a pairwise key with a physical neighbour is given by

In general, it is difficult to compute the above integrals. Liu and Ning [182] compute the probability p′ for the density function given by Equation (B.11) and establish that p′ ≈ 1 for small deployment errors, namely ∊ ≤ ρ. As ∊ increases, p′ gradually reduces to the corresponding probability for the random pairwise scheme.

In order to add sensor nodes at a later point of time, the key set-up server again uses deployment knowledge. The keys rings of the new nodes are loaded based on the expected deployment locations of these nodes and on the (expected or known) locations of the deployed nodes. Pairwise keys between the new and the deployed nodes are communicated to the deployed nodes over secure channels (routing through uncompromised nodes).

B.8.2. Location-aware Polynomial-pool-based Scheme

Several variants of the closest pairwise keys scheme have been proposed. Liu and Ning themselves propose an extension based on pseudorandom functions [182]. Du et al. propose a variant of the basic (EG) scheme based on a specific model of deployment [80]. We end this section by briefly outlining a location-aware adaptation of the polynomial-pool-based scheme (Section B.6).

For simplicity, let us assume that the deployment region is a rectangular area. This region is partitioned into a 2-dimensional array of rectangular cells. Let the partition consist of R rows and C columns. The cell located at the i-th row and the j-th column is denoted by Ci,j. The neighbours of the cell Ci,j are taken to be the four adjacent cells: Ci–1,j, Ci+1,j, Ci,j–1, Ci,j+1.

The key set-up server first decides a finite field with q just big enough to accommodate a cryptographic key. The server also chooses R×C random symmetric bivariate polynomials . The polynomial fi,j is meant for the cell Ci,j. The degree t (in both X and Y) of each fi,j is so chosen that each sensor node has sufficient memory to store the shares of five such polynomials.

Let u be a node to be deployed and let the expected deployment location of u lie in the cell Ci,j called the home cell of u. The key ring of u is loaded with the shares (evaluated at u) of the five polynomials corresponding to the home cell and its four neighbouring cells. More precisely, u gets the five shares: fi,j(X, u), fi–1,j(X, u), fi+1,j(X, u), fi,j–1(X, u), and fi,j+1(X, u). The set-up server also stores in u’s memory the ID (i, j) of its home cell.

In the direct key establishment phase, each node u broadcasts the ID (i, j) of its home cell (or some messages encrypted by potential pairwise keys). Those physical neighbours whose home cells are either the same as or neighbouring to that of u can establish pairwise keys with u.

An analysis of the performance of this location-aware poly-pool-based scheme can be carried out along similar lines to the closest pairwise scheme. We leave out the details here and refer the reader to Liu and Ning [182].

C. Complexity Theory and Cryptography

C.1Introduction
C.2Provably Difficult Computational Problems Are not Suitable
C.3One-way Functions and the Complexity Class UP

. . . complexity turns out to be most elusive precisely where it would be most welcome.

—C. H. Papadimitriou [229]

Real knowledge is to know the extent of one’s ignorance.

—Confucius

The complex develops out of the simple.

—Colin Wilson

C.1. Introduction

It is worthwhile to ask the question why public-key cryptography must be based on problems that are only believed to be difficult. Complexity theory suggests concrete examples of provably intractable problems. This appendix provides a brief conceptual explanation why these provably difficult problems cannot be used for building cryptographic protocols. We may consequently conclude that at present we cannot prove a public-key cryptosystem to be secure. That is bad news, but we have to live with it.

Here, we make no attempts to furnish definitions of formal complexity classes. The excellent books by Papadimitriou [229] and by Sipser [280] can be consulted for that purpose. Here is a list of the complexity classes that we require for our discussion. The relationships between these classes are depicted in Figure C.1. All the containments shown in this figure are conjectured to be proper. With an abuse of notations we identify functional problems with decision problems.

Table C.1. Some complexity classes
ClassBrief description
PLanguages accepted by deterministic polynomial-time Turing machines
NPLanguages accepted by non-deterministic polynomial-time Turing machines
coNPComplements of languages in NP
UPLanguages accepted by unambiguous polynomial-time Turing machines
PSPACELanguages accepted by polynomial-space Turing machines
EXPTIMELanguages accepted by deterministic exponential-time Turing machines
EXPSPACELanguages accepted by exponential-space Turing machines

Figure C.1. Relations between complexity classes


C.2. Provably Difficult Computational Problems Are not Suitable

The problem, arguably the deepest unsolved problem in theoretical computer science, may be suspected to have some bearing on public-key cryptography. Under the assumption that P ≠ NP, one may feel tempted to go for using NP-complete problems for building secure cryptosystems. Unfortunately, this tempting invitation does not prove to be fruitful. Several cryptosystems based on NP-complete problems were broken and that is not really a surprise.

It may be the case that P = NP, and, if so, all NP-complete problems are solvable in polynomial time. It may, therefore, be advised to select problems that lie outside NP, that is, in strictly bigger complexity classes. By the time and space hierarchy theorems, we have and . Both EXPTIME and EXPSPACE have complete problems. An EXPTIME-complete problem cannot be solved in polynomial time, whereas an EXPSPACE-complete problem cannot be solved in polynomial space nor in polynomial time too. How about using these complete problems for designing cryptosystems? The idea may sound interesting, but these provably exponential problems turn out to be even poorer, perhaps irrelevant, for use in cryptography.

Let fe and fd be the encryption and decryption transforms for a public-key cryptosystem. We assume that the set of plaintext messages and the set of ciphertext messages are both finite. (Public-key cryptosystems are like block ciphers in this respect.) Moreover, since a ciphertext c = fe(m, e) is computable in polynomial time, the length of c is bounded by a polynomial in the length of m. An intruder can non-deterministically guess messages m (from the finite space) and check if c = fe(m, e) to validate the correctness of the guess. It, therefore, follows that deciphering a ciphertext message (with no additional information) is a problem in NP. That is the reason why we should not look beyond NP.

However, the full class NP, in particular, the most difficult (that is, complete) problems of NP, may be irrelevant for cryptography, as we argue in the next section. In other words, for building cryptosystems we expect to effectively exploit problems that are believed to be easier than NP-complete. Both the integer factoring and the discrete log problems are in the class NP ∩ coNP. We have P ⊆ NP ∩ coNP. It is widely believed that this containment is proper. Also NP ∩ coNP is not known (nor expected) to have complete problems. Even if , both the factoring and the discrete log problems need not be outside P, since we are unlikely to produce completeness proofs for them. Only historical evidences exist, in favor of the fact that these two problems are difficult. The situation may change tomorrow. Complexity theory does not offer any formal protection.

Exercise Set C.2

C.1Prove that the primality testing problem

is in NP ∩ coNP.

(Remark: The AKS algorithm is a deterministic poly-time primality testing algorithm and therefore PRIME is in P and so trivially in NP ∩ coNP too. It can, however, be independently proved that primes have succinct certificates.)

C.2Consider the decision version of the integer factorization problem:

  1. Prove that .

  2. Given a poly-time algorithm for DIFP, design a poly-time algorithm that factors an integer (that is, that solves the functional problem IFP).

C.3Let G be a finite cyclic multiplicative group with a generator g. Assume that one can compute products in G in polynomial time. Consider the decision version of the discrete log problem in G:

Here, indices (indg a) are assumed to lie between 0 and (#G) – 1.

  1. Prove that .

  2. Given a poly-time algorithm for DDLP, design a poly-time algorithm that computes indices in G (that is, that solves the functional problem DLP in G).

C.3. One-way Functions and the Complexity Class UP

Any public-key encryption behaves like a one-way function, easy to compute but difficult to invert.

Definition C.1.

Let Σ be an alphabet (a finite set of symbols). One may assume, without loss of generality, that Σ = {0, 1}. Let Σ* denote the set of all strings over Σ. A function f : Σ* → Σ* is called a one-way function, if it satisfies the following properties.

  1. f must be injective, that is, for every β the inverse f–1(β), if existent, is unique.

  2. For some real constant k > 0, we have |α|1/k ≤ |f(α)| ≤ |α|k for all . (Here, |α| denotes the length of a string .)

  3. f can be computed in deterministic polynomial time, that is, .

  4. f–1 must not be computable in polynomial time[1], that is, f–1 ∉ P. In view of Property (2), we have . So we require .

    [1] A stronger (but essential) requirement is that f–1 must not be computable by polynomial-time probabilistic algorithms.

Property (1) ensures unique decryption. Property (2) implies that the length of f(α) is polynomially bounded both above and below by the length of α. Property (3) suggests ease of encryption, whereas Property (4) suggests difficulty of decryption.

We do not know whether there exists a one-way function. The following functions are strongly suspected to be one-way. However, we do not seem to have any clues about how we can prove these functions to be one-way.

Example C.1.
  1. The function that multiplies two primes p, q with p < q is believed to be one-way. Computing its inverse is the RSA integer factoring problem.

  2. The discrete exponentiation function in a finite field , that maps , 0 ≤ xq – 2, to for some fixed is suspected to be one-way. Its inverse is the discrete logarithm function.

  3. The RSA encryption function mme (mod n) for some fixed parameters n, e is alleged to be one-way. Its inverse is RSA decryption.

It is evident that if P = NP, there cannot exist one-way functions. The converse of this is not true, that is, even if P ≠ NP, there may exist no one-way functions.

Definition C.2.

A non-deterministic Turing machine which has at most one accepting branch of computation for every input string is called an unambiguous Turing machine. The class of languages accepted by poly-time unambiguous Turing machines is denoted by UP.

Clearly, P ⊆ UP ⊆ NP. Both the containments are assumed to be proper. The importance of the class UP stems from the following result:

Theorem C.1.

There exists a one-way function if and only if P ≠ UP.

Therefore, the question is relevant for cryptography and not the question. The class UP is not known (nor expected) to have complete problems. So locating a one-way function may be a difficult task. But at the minimum we are now in the right track.[2] Complexity theory helped us shift our attention from NP (or bigger classes) to UP.

[2] Well, hopefully!

In order to use a one-way function f for cryptographic purposes, we require additional properties of f. Computing f–1 must be difficult for an intruder, whereas the same computation ought to be easy to the legitimate recipient. Thus, f must support poly-time inversion, provided that some secret piece of information (the trapdoor) is available during the computation of the inverse. A one-way function with a trapdoor is called a trapdoor one-way function.

The first two functions of Example C.1 do not have obvious trapdoors and so cannot be straightaway used for designing cryptosystems. The third function (RSA encryption) has the requisite trapdoor, namely, the decryption exponent d satisfying ed ≡ 1 (mod φ(n)).

The hunt for a theoretical foundation does not end here. It begins. Most part of complexity theory deals with worst-case complexities of problems, rather than their average or expected complexities. A one-way function, even if existent, may be difficult to invert for only few instances, whereas cryptography demands the inversion problem to be difficult for most instances. A function meeting even this cryptographic demand need not be suitable, since there may be reductions to map hard instances to easy instances. Moreover, the trapdoors themselves may inject vulnerabilities and prepare room for quick attacks.

There still remains a long way to go!

Exercise Set C.3

C.4Let f : Σ* → Σ* be a function with the property that f(f(α)) = f(α) for every . Argue that f is not a one-way function.
C.5Design unambiguous polynomial time Turing machines for computing the inverses of the functions described in Example C.1.
C.6Show that if there exists a bijective one-way function, then NP ∩ coNP ≠ P. [H]

D. Hints to Selected Exercises

The greatest thing in family life is to take a hint when a hint is intended and not to take a hint when a hint isn’t intended.

—Robert Frost

Teachers open the door, but you must enter by yourself.

—Chinese Proverb

Imagination grows by exercise, and contrary to common belief, is more powerful in the mature than in the young.

—W. Somerset Maugham

2.11 (a)Apply Theorem 2.3 to the restriction to H of the canonical homomorphism GG/K.
2.11 (b)Apply Theorem 2.3 to the canonical homomorphism G/HG/K, aHaK, .
2.14 (c)Consider the canonical surjection GG/H.
2.17 (a)Let ij and . Then ord g divides both and and so is equal to 1, that is, g = e. Now let hi, and with . But then . Thus #(HiHj) = (#Hi)(#Hj). Generalize this argument to show that #(H1 · · · Hr) = n.
2.18First consider the special case #G = pr for some and . For each , the order ordG g is of the form psg for some sgr. Let s be the maximum of the values sg, . Take any element with ordG h = ps. Then e, h, . . . , hps–1 are all the elements x that satisfy xps = e. But by the choice of s every element satisfies xps = e. Hence we must have s = r. This proves the assertion for the special case. For the general case, use this special case in conjunction with Exercise 2.17.
2.19 (b)Show that , (h1, . . . , hr) ↦ h1 . . . hr, is a group isomorphism.
2.23Use Zorn’s lemma.
2.24 (c)Let be the intersection of all prime ideals of R. First show that . To prove the reverse inclusion take and consider the set S of all non-unit ideals of R such that for all . If f is a non-unit, the set S is non-empty and by Zorn’s lemma has a maximal element, say . Show that is a prime ideal of R.
2.25For , the map RR, bab, is injective and hence surjective by Exercise 2.4.
2.30Apply the isomorphism theorem to the canonical surjection , .
2.33[(1)⇒(2)] Let be an ascending chain of ideals of R. Consider the ideal which is finitely generated by hypothesis.

[(3)⇒(1)] Let be an ideal of R. Consider the set of all finitely generated ideals of R contained in .

2.36Use the pigeon-hole principle: If there are n + 1 pigeons in n holes, then there exists at least one hole containing more than one pigeons.
2.37Consider the integer satisfying 2tn < 2t+1.
2.39 (e)12 ≡ (n – 1)2 (mod n).
2.39 (f)Apply Wilson’s theorem.
2.40Use Fermat’s little theorem.
2.41Use Wilson’s theorem or Euler’s criterion.
2.45Reduce to the case y2 ≡ α (mod p).
2.49 (a)Consider the canonical group homomorphism and the fact that a surjective group homomorphism from a cyclic group G onto G′ implies that G′ is cyclic.
2.49 (b)Let be a primitive element modulo p. The residue class of a in has order k(p – 1) for some . Show that the order of b := p + 1 modulo pe is pe–1. So the order of akb modulo pe is pe–1(p – 1) = φ(pe).
2.50Use the Chinese remainder theorem in conjunction with Exercises 2.20 and 2.49.
2.53Take . The interpolating polynomial is . Use Exercise 2.52 to establish the uniqueness.
2.56 (b) is irreducible in if and only if f(X + 1) is irreducible in .
2.58Use the fundamental theorem of algebra.
2.63Consider the set of all linearly independent subsets of V that contain T. Show that every chain in has an upper bound in . By Zorn’s Lemma, there exists a maximal element . Show that S generates V.
2.64 (b)Use Exercise 2.63.
2.68Let p1, . . . , pn be n distinct primes. Take and ai := a/pi for i = 1, . . . , n.
2.72 (a)If N is the -submodule of generated by ai/bi, i = 1, . . . , n, with gcd(ai, bi) = 1, then for any prime p that does not divide b1 · · · bn we have 1/pN.
2.72 (b)Any two distinct elements of are linearly dependent over . Now use Exercise 2.69.
2.74 (b)Let the conjugates of over F be α1 = α, α2, . . . , αn. Since is injective, it follows from (a) that makes a permutation of α1, . . . , αn. So is surjective.
2.75 (a)Use Exercise 2.61.
2.76 (b)The if part follows from Exercise 2.61. For proving the only if part, take . If the polynomial f(X) := Xpa splits over F, we are done. So suppose that there exists an irreducible divisor of f(X) of degree ≥ 2. By the separability of F, there exist two distinct roots α, β of g(X). Let K := F (α, β). Show that the Frobenius map , , is an endomorphism of K. Also there exists a field isomorphism τ : F (α) → F (β) which fixes F element-wise and takes α ↦ β. But then . Since any field homomorphism is injective, α equals β, a contradiction. Thus no g(X) chosen as above can exist.
2.77 (a)Let be an irreducible polynomial with g(α) = 0 for some . Let β be another root of g. We show that . By Lemma 2.5, there is an isomorphism μ : F(α) → F(β). Clearly, K is the splitting field of f over F(α). Let K′ be the splitting field of μ*(f) over F (β). By Proposition 2.33, KK′. If are the roots of f, then K′ ≅ F (β, γ1, . . . , γd) = K(β). But then KK(β).
2.78 (a)Consider transcendental numbers.
2.78 (b)Let . For , we have , implying that for a, with ab. Now assume for some . Choose a rational number b with . Then , a contradiction. Thus . Similarly .
2.80Use the binomial theorem and induction on n.
2.82Follow the proof of Theorem 2.37.
2.90Example 2.18.
2.91 (b)By the fundamental theorem of Galois theory, # . Now show that are distinct -automorphisms of .
2.92 (a)Assume r > 1. We have the extensions , where is the splitting field of f over and hence over . Consider the minimal polynomial of a root of f over . Conversely, let f be reducible over . Choose an irreducible factor of f with deg h = s < d. Now h has one (and hence all) roots in and, therefore, d|sm.
2.93Use Corollary 2.18.
2.98In each case, the defining polynomial is quadratic in Y (and with coefficients in K[X]). If this polynomial admits a non-trivial factorization, one can reach a contradiction by considering the degrees of X in the coefficients of Y1 and Y0.
2.103For simplicity, consider the case char K ≠ 2, 3. Show that the curves Y2 + Y = X3 and Y2 = X3 + X have j-invariants 0 and 1728 respectively. Finally, if , 1728, then the curve has j-invariant . One must also argue that these are actually elliptic curves, that is, have non-zero discriminants.
2.111Use Theorem 2.51.
2.112 (a)Pair a point with its opposite. This pairing fails for points of orders 1 and 2.
2.112 (c)Consider the elliptic curve E : Y2 = X3 + 3 over . We have , whereas X3 + 3 is irreducible modulo 13.
2.113 (a)Every element of has a unique square root.
2.115 (a)Use Theorem 2.49 or Exercise 2.17.
2.115 (b)Use Theorem 2.50.
2.115 (c)The trace of Frobenius at q is 0 in this case. Now, use Theorem 2.50.
2.123Factor N(G) in .
2.127Let . For each i, write , . But then det , where , δij being the Kronecker delta.
2.128 (b)Use Part (a) and Exercise 2.126(c).
2.128 (c)Let . By Exercise 2.130, is integral over . Let be the ideal generated by in and let and be the ideals of generated respectively by and . Now, use Part (b).
2.133 (b)In a PID, non-zero prime ideals are maximal.
2.137 (a)Since and are maximal, we have , that is, a1 + a2 = 1 for some and . Now use the fact that (a1 + a2)e1 + e2 = 1.
2.137 (b)Use CRT.
2.138 (a)Since is invertible, for some fractional ideal .
2.140 (a)For , let constitute a complete residue system of modulo . Then also form a complete residue system of modulo .
2.142 (d)Take in Part (b).
2.143 (a)Reduce modulo 4.
2.143 (c)Let divide this gcd. Then divides 2y and . Take norms.
2.144 (b)Look at the expansion of a – 1 in base p. More precisely, let a < pN for some . Then –a = (pNa) – pN = [(pN – 1) – (a – 1)] – pN.
2.152 (c)First show that .
2.153Use unique factorization of rationals.
2.154Show by induction on n that pn+1 divides apn+1apn in for all .
2.161There exists an irreducible polynomial in of every degree .
3.7The implication is obvious. For the reverse implication, use Proposition 2.5.
3.18 (b)Consider the binary expansion of m.
3.19if n is a pseudoprime to base a and not a pseudoprime to base b, then n is not a pseudoprime to base ab.
3.20 (a)If p2|n for some , take with ordn(a) = p. If n is square-free, consider a prime divisor p of n and take with and a ≡ 1 (mod n/p).
3.20 (b)if n is an Euler pseudoprime to base a and not an Euler pseudoprime to base b, then n is not an Euler pseudoprime to base ab.
3.21 (a)Let be the prime factorization of n with r and each αi in . Then, . For odd pi, the group is cyclic of order and hence contains an element of order pi – 1.
3.21 (b)ordn(–1) = 2.
3.21 (c)Let vp(n) ≥ 2 for some odd prime p. Construct an element with ordn(a) = p.
3.28Proceed by induction on i = 1, . . . , r. For 1 ≤ ir, define νi := n1 · · · ni and let be a solution of the congruences biaj (mod nj) for j = 1, . . . , i. If i < r, use the combining formula given in Section 2.5 to find such that bi+1bi (mod νi) and bi+1ai+1 (mod ni+1).
3.31Apply Newton’s iteration to compute a zero of x2n.
3.32 (a)Apply Newton’s iteration to compute a zero of xkn.
3.34 (b)The updating d(X) := d(X) – Xisb(X) needs to consider only the non-zero words of b.
3.36 (b)First consider b = 0 and note that the roots of X(q–1)/2 – 1 (resp. X(q–1)/2 + 1) are all the quadratic residues (resp. non-residues) of .
3.36 (c)First consider b = 0.
3.40For , we have ord(a)|m and for each i = 1, . . . , r the multiplicity vpi (ord(a)) is the smallest of the non-negative integers k satisfying .
3.41 (a)Use the CRT.
3.43 (a)Use the CRT and the fact that for an odd prime r ≡ 3 (mod 4).
4.1 (a)Using the CRT, reduce to the case that n is prime. Then is bijective ⇔ the restriction is bijective. Now, if gcd(a, φ(n)) = 1, the inverse of is given by , where ab ≡ 1 (mod φ(n)). On the other hand, if q is a prime divisor of gcd(a, φ(n)), choose an element with ord(y) = q. But then ya ≡ 1 (mod n), that is, is not injective. This exercise provides the foundation for the RSA cryptosystems.
4.1 (b)In view of the CRT, reduce to the case n = pα for and α > 1. Then (pα–1)a ≡ 0 (mod n).
4.6Consider the integral .
4.9Use the CRT and lifting.
4.10For proving , let n be an odd composite integer, choose a random and compute a square root x of y2 modulo n. By Exercise 4.9, the probability that x ≡ ±y (mod n) is at most 1/2.
4.12 (d)Eliminate a from T (a, b, c) using a + b + c = 0. For each fixed c, allow b to vary and use a sieve to find out all the values of b for which T (a, b, c) is smooth for the fixed c.
4.13You may use the prime number theorem and the fact that the sum of the reciprocals of the first t primes asymptotically approaches ln ln t.
4.15If a < a1 or a > am, then no i exists. So assume that a1aam and let d := ⌊(1 + m)/2⌋. If a = ad, return d, else if a < ad, recursively search a among the elements a1, . . . , ad–1, and if a > ad, recursively search a among the elements ad+1, . . . , am.
4.16 (a)Use Lagrange’s interpolation formula (Exercise 2.53).
4.18 (a)One may precompute the values σi := p rem qi, i = 1, . . . , t. Note that qi|(gα + kp) if and only if ρk,i = 0.
4.19 (a)Use the approximation T (c1, c2) ≈ (c1 + c2)H.
4.21 (c)T (a, b, c) = –b2c(x + cy)b + (zc2x).
4.21 (d)Imitate the second stage of the LSM.
4.23Let the factor base consist of all irreducible polynomials over of degrees ≤ m together with the polynomials of the form Xk + h(X), , deg hm. The optimal running time of this algorithm corresponds to .
4.24 (b) is square-free.
4.24 (c)Use the fact Xm – 1 = (Xm/pvp(m) – 1)pvp(m).
4.24 (d)Theorem 2.39.
4.25 (a)Look at the roots of the polynomials on the two sides.
4.25 (c)If ord ω = m, then ord(–ω) = 2m.
4.25 (d)ω, ωq, . . . , ωql–1 are all the roots of the minimal polynomial of ω over .
4.26 (b)Use the Mordell–Weil theorem.
4.26 (c)Use Theorem 4.2.
5.2 (a)Solve the simultaneous congruences xci (mod ni), i = 1, . . . , e, and then take the integer e-th root of the solution x, 1 ≤ xn1 · · · ne.
5.2 (b)Append (different) pseudorandom bit strings to m before encryption. This process is often referred to as salting.
5.3 (a)In view of the Chinese remainder theorem, reduce to the case n = pr for some and .
5.4ue1 + ve2 = 1 for some u, .
5.6If the same session key is used to generate the ciphertext pairs (r1, s1) and (r2, s2) on two plaintext messages m1 and m2, then m1/m2 = s1/s2.
5.7 (c)Let x = (xl–1 . . . x1x0)2. Define x′ := (xl–1 . . . x2x1)2 and y′ := gx′ (mod p). Then, yy′2gx0 (mod p). Since x0 is easily computable, y′ can be obtained by obtaining a square root of y modulo p. Argue that a call of the oracle helps us choose the correct square root y′ of y. Now, use recursion.
5.8Let g′ be any randomly chosen generator of , where q := ph. One computes for i = 0, 1, . . . , p – 1. We then have the equality of the sets

modulo q – 1, where l := indg′ g. But then for each i we have a (yet unknown) j such that . Show that trying all possibilities for i and j one can effectively recover l and hence g = g′l and hence π.

5.9Let g′, and l be as in Exercise 5.8. Now, we have the equality of the sets

modulo q – 1.

5.11 (mod β) are polynomials with small coefficients.
5.15 (a)If Alice generates the signatures (M1, s1) and (M2, s2) on two messages M1 and M2, then her signature on a message M with H(M) ≡ H(M1)H(M2) (mod n) is s1s2 (mod n). Thus, without knowing the private key of Alice, an intruder can generate a valid signature (M, s1s2) of Alice, provided that such an M can be computed. Of course, here the intruder has little control over the message M. The PKC standards form RSA Laboratories add some redundancy to the hash function output before signing. The product of two hash values with redundancy is, in general, expected not to have the redundancy. This increases the security of the scheme against existential forgeries beyond that provided by the first pre-image resistance of the underlying hash function.
5.15 (b)For any , a valid signature is (M, s), where H(M) ≡ s2 (mod n).
5.15 (c)Choose random integers u, v with gcd(v, n) = 1 and take d′ := u + dv. Of course, d and hence d′ are unknown to Carol, but she can compute s = gd′ = gu(gd)v and t ≡ –H(s)v–1 (mod n). But then (M, s, t) is a valid ElGamal signature on a message M for which H(M) ≡ tu (mod n).
5.16Obviously, c itself could be a possible choice, but that is not random and Bob might refuse to sign c. Carol should hide c by cre (mod n) for some randomly chosen r known to her.
5.23 (a) by the CRT.
5.25 (a)Replace the random challenge of the verifier by the hash value of the string obtained by concatenating the message to be signed with the witness.
5.26 (d)Bob finds a random b′ with and sends a := (b′)2 (mod n) to Alice. But then Alice’s response b yields a non-trivial factor gcd(bb′, n) of n.
7.5 (mod n) and mse (mod n).
7.9 (a)Use Exercise 2.44(b).
7.9 (c)Again use Exercise 2.44(b).
7.9 (d)Use Part (c) in conjunction with the CRT, and separately consider the three cases v2(p–1) = v2(q – 1), v2(p – 1) > v2(q – 1) and v2(p – 1) < v2(q – 1).
A.2 for all X, J. One does not have to look at the S-boxes for proving this.
A.9 (c)For i = 0, 1, 2, 3, 4Nr, 4Nr + 1, 4Nr + 2, 4Nr + 3, take . For other values of i, take .
A.14 (b)Let DL(X) := XdCL(1/X) = a0 + a1X + a2X2 + · · · + ad–1Xd–1 + Xd. Consider the -algebra , where x := X + 〈DL(X)〉. The -linear transformation λx : AA defined by g(x) ↦ xg(x) has the matrix ΔL with respect to the polynomial basis (1, x, . . . , xd–1). If is the minimal polynomial of λx, then [fx)](1) = f(x) = 0. Now, use the fact that 1, x, . . . , xd–1 are linearly independent over .
A.16 (b)[only if] Take σ ≠ 00 · · · 01. Since σ is non-zero, si = 1 for some . Construct an LFSR with d – 1 stages initialized to s0s1 · · · sd–2 to generate σ.
A.19Suppose that we want to compute a second pre-image for H2(x). If , any is a second pre-image for H2(x). If , computing a second pre-image for H2(x) is equivalent to computing a second pre-image for H(x). The density of the (finite) set S is 0 in the (infinite) set of all bit strings. Thus, H2 is second pre-image resistant. On the other hand, for any two distinct x, we have a collision (x, x′) for H2.
A.20Collision resistance of H implies that of H3. On the other hand, for a positive fraction (half) of the (n + 1)-bit strings y, it is easy to compute a pre-image of y under H3.
A.21If y is a square root of a modulo m, then so is my too.
A.22Use the birthday paradox (Exercise 2.172).
A.23 (d)Let L := F1(L′) and R := F1(R′) with both R and R′ non-zero. Then, F1(LR) = F2(L′R′).
A.25Let h(i) denote the column vector of dimension 160 having the bits of H(i) as its elements and m(i) the column vector of dimension 512 + 160 = 672 having the bits of M(i) and of H(i) as its elements. Show that the modified design of SHA-1 leads to the relation h(i)Am(i–1) + c (mod 2) for some constant 160 × 672 matrix A over and for some constant vector c. So what then?
C.6For α, , call α ≤ β if and only if |α| < |β| or |α| = |β| and α is lexicographically smaller than β. This ≤ produces a well-ordering of Σ*. For a one-way function f, look at the language for some with γ ≤ β}.

 

References

If you steal from one author, it’s plagiarism; if you steal from many, it’s research.

—Wilson Mizner

Literature is the question minus the answer.

—Roland Barthes

Everything that can be invented, has been invented.

—Charles H. Duell, 1899

[1] Adkins, W. A. and S. H. Weintraub (1992). Algebra: An Approach via Module Theory. Graduate Texts in Mathematics, 136. New York: Springer.

[2] Adleman, L. M., J. DeMarrais and M.-D. A. Huang (1994). “A Subexponential Algorithm for Discrete Logarithms over the Rational Subgroup of the Jacobians of Large Genus Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-I, Lecture Notes in Computer Science, 877. pp. 28–40. Berlin/Heidelberg: Springer.

[3] Adleman, L. M. and M.-D. A. Huang (1992). “Primality Testing and Two Dimensional Abelian Varieties over Finite Fields”, Lecture Notes in Mathematics, 1512. Berlin: Springer.

[4] Adleman, L. M., C. Pomerance and R. S. Rumely (1983). “On Distinguishing Prime Numbers from Composite Numbers”, Annals of Mathematics, 117: 173–206.

[5] Agarwal, M., N. Kayal and N. Saxena (2002), “Primes Is in P” [online document]. Available at http://www.cse.iitk.ac.in/users/manindra/algebra/primality_v6.pdf (October 2008).

[6] * Ahlfors, L. V. (1966). Complex Analysis. New York: McGraw-Hill.

[7] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1974). The Designs and Analysis of Algorithms. Reading, Massachusetts: Addison-Wesley.

[8] * Aho, A. V., J. E. Hopcroft and J. D. Ullman (1983). Data Structues and Algorithms. Reading, Massachusetts: Addison-Wesley.

[9] Aigner, M. and E. Oswald (2007), “Power Analysis Tutorial” [online document]. Available at http://www.iaik.tugraz.at/content/research/implementation_attacks/introduction_to_impa/dpa_tutorial.pdf (October 2008).

[10] Akkar, M.-L., R. Bevan, P. Dischamp and D. Moyart (2000). “Power Analysis, What Is Now Possible”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 489–502. Berlin/Heidelberg: Springer.

[11] Anderson, R. and M. Kuhn (1997). “Low Cost Attacks on Tamper Resistant Devices”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 125–136. Berlin/Heidelberg: Springer.

[12] * Apostol, T. M. (1976). Introduction to Analytic Number Theory. Undergraduate Texts in Mathematics. New York: Springer.

[13] Arnold, V. I. (1999). “Polymathematics: Is Mathematics a Single Science or a Set of Arts?”, in V. Arnold, M. Atiyah, P. Lax and B. Mazur (eds.), Mathematics: Frontiers and Perspectives, pp. 403–416. Providence, Rhode Island: American Mathematical Society.

[14] Atiyah, M. F. and I. G. MacDonald (1969). Introduction to Commutative Algebra. Reading, Massachusetts: Addison-Wesley.

[15] Aumüller, C., P. Bier, W. Fischer, P. Hofreiter and J.-P. Seifert (2002), “Fault Attacks on RSA with CRT: Concrete Results and Practical Countermeasures” [online document]. Available at http://eprint.iacr.org/2002/073 (October 2008).

[16] Balasubramanian, R. and N. Koblitz (1998). “The Improbability that an Elliptic Curve has Subexponential Discrete Log Problem under the Menezes-Okamoto Vanstone Algorithm”, Journal of Cryptology, 11: 141–145.

[17] Bao, F., R. H. Deng, Y. Han, A. B. Jeng, A. D. Narasimhalu, T.-H. Ngair (1997). “Breaking Public Key Cryptosystems on Tamper Resistant Devices in the Presence of Transient Faults”, Security Protocols—5th International Workshop, Lecture Notes in Computer Science, 1361. pp. 115–124. Berlin/Heidelberg: Springer.

[18] Bellare, M. and P. Rogaway (1995). “Optimal Asymmetric Encryption—How to Encrypt with RSA”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 92–111. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/oaep.html (October 2008).

[19] Bellare, M. and P. Rogaway (1996). “The Exact Security of Digital Signatures: How to Sign with RSA and Rabin”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 399–416. Berlin/Heidelberg: Springer. A revised version is available at http://www-cse.ucsd.edu/users/mihir/papers/exactsigs.html (October 2008).

[20] Bennett, C. H. and G. Brassard (1984). “Quantum Cryptography: Public Key Distribution and Coin Tossing”, pp. 175–179. Proceedings of the IEEE International Conference on Computers, Systems and Signal Processing, Bangalore, India, December.

[21] Berlekamp, E. R. (1968). Algebraic Coding Theory. New York: McGraw-Hill.

[22] Biham, E. and A. Shamir (1997). “Differential Fault Analysis of Secret Key Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 513–528. Berlin/Heidelberg: Springer.

[23] Blake, I. F., R. Fuji-Hara, R. C. Mullin and S. A. Vanstone (1984). “Computing Logarithms in Finite Fields of Characteristic Two”, SIAM Journal of Algebraic and Discrete Methods, 5: 276–285.

[24] Blake, I. F., G. Seroussi and N. P. Smart (1999). Elliptic Curves in Cryptography. Cambridge: Cambridge University Press.

[25] Blom, R. (1985). “An Optimal Class of Symmetric Key Generation Systems”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 335–338. Berlin/Heidelberg: Springer.

[26] Blum, L., M. Blum, and M. Shub (1986). “A Simple Unpredictable Pseudo-Random Number Generator”, SIAM Journal on Computing, 15: 364–383.

[27] Blum, M. and S. Goldwasser (1985). “An Efficient Probabilistic Public Key Encryption Scheme Which Hides All Partial Information”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 289–299. Berlin/Heidelberg: Springer.

[28] Blundo, C., A. De Santis, A. Herzberg, S. Kutten, U. Vaccaro and M. Yung (1993). “Perfectly-Secure Key Distribution for Dynamic Conferences”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 471–486. Berlin/Heidelberg: Springer.

[29] Boneh, D. (1999). “Twenty Years of Attacks on the RSA Cryptosystem”, Notices of the American Mathematical Society, 46 (2): 203–213.

[30] Boneh, D., R. A. DeMillo and R. J. Lipton (1997). “On the Importance of Checking Cryptographic Protocols for Faults”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 37–51. Berlin/Heidelberg: Springer.

[31] Boneh, D., R. A. DeMillo and R. J. Lipton (2001). “On the Importance of Eliminating Errors in Cryptographic Computations”, Journal of Cryptology, 14 (2): 101–119.

[32] Boneh, D. and G. Durfee (1999). “Cryptanalysis of RSA with Private Key d Less Than N0.292”, Advances in Cryptology—EUROCRYPT ’99, Lecture Notes in Computer Science, 1592. pp. 1–11. Berlin/Heidelberg: Springer.

[33] Boneh, D., G. Durfee and Y. Frankel (1998). “Exposing an RSA Private Key Given a Small Fraction of Its Bits”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 25–34. Berlin/Heidelberg: Springer.

[34] Boneh, D. and M. K. Franklin (2001). “Identity-based Encryption from the Weil Pairing”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 213–229. Berlin/Heidelberg: Springer.

[35] Boneh, D. and M. K. Franklin (2003). “Identity-based Encryption from the Weil Pairing”, SIAM Journal of Computing, (32) 3: 586–615.

[36] Bressoud, D. M. (1989). Factorization and Primality Testing. Undergraduate Texts in Mathematics. New York: Springer.

[37] * Buchmann, J. A. (2004). Introduction to Cryptography. Undergraduate Texts in Mathematics. New York: Springer.

[38] Buchmann, J. A. et al. (2004), “The Number Field Cryptography Project” [online document]. Available at http://www.informatik.tu-darmstadt.de/TI/Forschung/nfc.html (October 2008).

[39] Buchmann, J. A. and S. Hamdy (2001). “A Survey on IQ Cryptography”. Technical report TI-4/01, TU Darmstadt, Fachbereich Informatik.

[40] Buchmann, J. A. and D. Weber (2000). “Discrete Logarithms: Recent Progress”, in J. Buchmann, T. Hoeholdt, H. Stichtenoth and H. Tapia-Recillas (eds.), Coding Theory, Cryptography and Related Areas, pp. 42–56. Proceedings of an International Conference on Coding Theory, Cryptography and Related Areas, Guanajuato, Mexico, April 1998.

[41] Buhler, J., H. W. Lenstra and C. Pomerance (1993). “Factoring Integers with the Number Field Sieve”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 50–94. Berlin: Springer.

[42] * Burton, D. M. (1998). Elementary Number Theory, 4th ed. New York: McGraw-Hill.

[43] Cantor, D. G. (1994). “On the Analogue of Division Polynomials for Hyperelliptic Curves”, Journal für die reine und angewandte Mathematik, 447: 91–145.

[44] Chan, H., A. Perrig and D. Song (2003). “Random Key Predistribution Schemes for Sensor Networks”, pp. 197–213. Proeedings of the 24th IEEE Symposium on Research in Security and Privacy, Berkeley, California, 11–14 May.

[45] Chari, S., C. S. Jutla, J. R. Rao, and P. Rohatgi (1999). “Towards Sound Approaches to Counteract Power-Analysis Attacks”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 398–412. Berlin/Heidelberg: Springer.

[46] Charlap, L. S. and R. Coley (1990). “An Elementary Introduction to Elliptic Curves II”, CCR Expository Report 34.

[47] Charlap, L. S. and D. P. Robbins (1988). “An Elementary Introduction to Elliptic Curves”, CRD Expository Report 31.

[48] Chaum, D. (1983). “Blind Signatures for Untraceable Payments”, Advances in Cryptology—CRYPTO ’82. pp. 199–203. New York: Plenum Press.

[49] Chaum, D. (1985). “Security Without Identification: Transaction System to Make Big Brother Obsolete”, Communications of the ACM, 28 (10): 1030–1044.

[50] Chaum, D. (1989). “Privacy Protected Payments: Unconditional Payer and/or Payee Untraceability”, Smart Card 2000: The Future of IC Cards, pp. 69–93. Amsterdam: North-Holland.

[51] Chaum, D. (1990). “Zero-Knowledge Undeniable Signatures”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 473. pp. 458–464. Berlin/Heidelberg: Springer.

[52] Chaum, D. and H. van Antwerpen (1989). “Undeniable Signatures”, Advances in Cryptology—CRYPTO ’89, Lecture Notes in Computer Science, 435. pp. 212–217. Berlin/Heidelberg: Springer.

[53] Chaum, D., E. van Heijst and B. Pfitzmann (1991). “Cryptographically Strong Undeniable Signatures, Unconditionally Secure for the Signer”, Advances in Cryptology—CRYPTO ’91, Lecture Notes in Computer Science, 576. pp. 470–484. Berlin/Heidelberg: Springer.

[54] Chor, B. and R. L. Rivest (1988). “A Knapsack Type Cryptosystem Based on Arithmetic in Finite Fields”, IEEE Transactions on Information Theory, 34: 901–909.

[55] Clavier, C., J.-S. Coron and N. Dabbous (2000). “Differential Power Analysis in the Presence of Hardware Countermeasures”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 252–263. Berlin/Heidelberg: Springer.

[56] Cohen, H. (1993). A Course in Computational Algebraic Number Theory. Graduate Texts in Mathematics, 138. New York: Springer.

[57] Coppersmith, D. (1984). “Fast Evaluation of Logarithms in Fields of Characteristic Two”, IEEE Transactions on Information Theory, 30: 587–594.

[58] Coppersmith, D. (1994). “Solving Homogeneous Equations over GF[2] via Block Wiedemann Algorithm”, Mathematics of Computation, 62: 333–350.

[59] Coppersmith, D., A. M. Odlyzko and R. Schroeppel (1986). “Discrete Logarithms in GF (p)”, Algorithmica, 1: 1–15.

[60] Coppersmith, D. and S. Winograd (1982). “On the Asymptotic Complexity of Matrix Multiplication”, SIAM Journal of Computing, 11 (3): 472–492.

[61] * Cormen, T. H., C. E. Lieserson, R. L. Rivest and C. Stein (2001). Introduction to Algorithms, 2nd ed. Cambridge, Massachusetts: MIT Press.

[62] Coron, J.-S. (1999). “Resistance Against Differential Power Analysis for Elliptic Curve Cryptosystems”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1965. pp. 292–302. Berlin/Heidelberg: Springer.

[63] Coron, J.-S., L. Goubin (2000). “On Boolean and Arithmetic Masking Against Differential Power Analysis”, Cryptographic Hardware and Embedded Systems—CHES 2000, Lecture Notes in Computer Science, 1965. pp. 231–237. Berlin/Heidelberg: Springer.

[64] Coster, M. J., A. Joux, B. A. LaMacchia, A. M. Odlyzko, C. P. Schnorr and J. Stern (1992). “Improved Low-Density Subset Sum Algorithms”, Computational Complexity, 2: 111–128.

[65] Coster, M. J., B. A. LaMacchia, A. M. Odlyzko and C. P. Schnorr (1991). “An Improved Low-Density Subset Sum Algorithm”, Advances in Cryptology—EUROCRYPT ’91, Lecture Notes in Computer Science, 547. pp. 54–67. Berlin/Heidelberg: Springer.

[66] Courtois, N. (2003). “Fast Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 177–194. Berlin/Heidelberg: Springer.

[67] Courtois, N. and W. Meier (2003). “Algebraic Attacks on Stream Ciphers with Linear Feedback”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 345–359. Berlin/Heidelberg: Springer.

[68] Courtois, N. and J. Pieprzyk (2003). “Cryptanalysis of Block Ciphers with Overdefined Systems of Equations”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 267–287. Berlin/Heidelberg: Springer.

[69] Crandall, R. and C. Pomerance (2001). Prime Numbers: A Computational Perspective. New York: Springer.

[70] Crépeau, C. and A. Slakmon (2003). “Simple Backdoors for RSA Key Generation”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 403–416. Berlin/Heidelberg: Springer.

[71] Daemen, J. and V. Rijmen (2002). The Design of Rijndael: AES—The Advanced Encryption Standard. New York: Springer.

[72] Das, A. (1999). Galois Field Computations: Implementation of a Library and a Study of the Discrete Logarithm Problem [dissertation]. Bangalore, India: Indian Institute of Science.

[73] Das, A. and C. E. Veni Madhavan (1999). “Performance Comparison of Linear Sieve and Cubic Sieve Algorithms for Discrete Logarithms over Prime Fields”, Algorithms and Computation, ISAAC ’99, Lecture Notes in Computer Science, 1741. pp. 295–306. Berlin/Heidelberg: Springer.

[74] * Delfs, H. and H. Knebl (2007). Introduction to Cryptography: Principles and Applications, 2nd ed. Berlin and New York: Springer.

[75] Deutsch, D. (1985). “Quantum Theory, the Church-Turing Principle and the Universal Quantum Computer”. Proceedings of the Royal Society of London, Series A, 400. pp. 97–117.

[76] Deutsch, D. (1998). The Fabric of Reality: The Science of Parallel Universes—and Its Implications. London: Penguin.

[77] Dhem, J.-F., F. Koeune, P.-A. Leroux, P. Mestré, J.-J. Quisquater and J.-L. Willems (2000). “A Practical Implementation of the Timing Attack”, in J.-J. Quisquater and B. Schneier (eds.), Smart Card: Research and Applications, Lecture Notes in Computer Science, 1820. Proceedings of the Third Working Conference on Smart Card Research and Advanced Applications—CARDIS ’98, Louvain-la-Neuve, Belgium, 14–16 September 1998. Springer.

[78] Diffie, W. and M. Hellman (1976). “New Directions in Cryptography”, IEEE Transactions on Information Theory, 22: 644–654.

[79] Du, W., J. Deng, Y. S. Han and P. K. Varshney (2003). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 42–51. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, 27–30 October.

[80] Du, W., J. Deng, Y. S. Han, S. Chen and P. K. Varshney (2004). “A Key Management Scheme for Wireless Sensor Networks Using Deployment Knowledge”. Proceedings of IEEE INFOCOM 2004, Hong Kong, 7–11 March.

[81] * Dummit, D. and R. Foote (2004). Abstract Algebra, 3rd ed. Somerset, New Jersey: John Wiley & Sons.

[82] Durfee, G. and P. Q. Nguyen (2000). “Cryptanalysis of the RSA Schemes with Short Secret Exponent from Asiacrypt ’99”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 30–44. Berlin/Heidelberg: Springer.

[83] Dusart, P. (1999). “The kth Prime Is Greater than k(ln k+ln ln k–1) for k > 2”, Mathematics of Computation, 68: 411–415.

[84] ElGamal, T. (1985). “A Public-Key Cryptosystem and a Signature Scheme Based on Discrete Logarithms”, IEEE Transactions on Information Theory, 31: 469–472.

[85] Elkies, N. D. (1998). “Elliptic and Modular Curves over Finite Fields and Related Computational Issues”, AMS/IP Studies in Advanced Mathematics, 7: 21–76.

[86] Enge, A. (1999). “Computing Discrete Logarithms in High-Genus Hyperelliptic Jacobians in Provably Subexponential Time”. Technical report CORR 99-04, University of Waterloo, Canada.

[87] Enge, A. and P. Gaudry (2002). “A General Framework for Subexponential Discrete Logarithm Algorithms”, Acta Arithmetica, 102 (1): 83–103.

[88] Eschenauer, L. and V. D. Gligor (2002). “A Key-Management Scheme for Distributed Sensor Networks”. Proceedings of the 9th ACM Conference on Computer and Communication Security, pp. 41–47. Washington D.C., USA, 18–22 November.

[89] * Esmonde, J. and M. Ram Murty (1999). Problems in Algebraic Number Theory. Graduate Texts in Mathematics, 190. New York: Springer.

[90] Fiat, A. and A. Shamir (1987). “How to Prove Yourself: Practical Solutions to Identification and Signature Problems”, Advances in Cryptology—CRYPTO ’86, Lecture Notes in Computer Science, 263. pp. 186–194. Berlin/Heidelberg: Springer.

[91] Feige, U., A. Fiat, and A. Shamir (1988). “Zero-Knowledge Proofs of Identity”, Journal of Cryptology, 1: 77–94.

[92] * Feller, W. (1966). Introduction to Probability Theory and Its Applications, 3rd ed. New York: John Wiley & Sons.

[93] Ferguson, N., J. Kelsey, S. Lucks, B. Schneier, M. Stay, D. Wagner and D. Whiting (2000). “Improved Cryptanalysis of Rijndael”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 213–230. Berlin/Heidelberg: Springer.

[94] Fouquet, M., P. Gaudry and R. Harley (2000). “An Extension of Satoh’s Algorithm and Its Implementation”, Journal of Ramanujan Mathematical Society, 15: 281–318.

[95] Fouquet, M., P. Gaudry and R. Harley (2001). “Finding Secure Curves with the Satoh-FGH Algorithm and an Early-Abort Strategy”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. Berlin/Heidelberg: Springer.

[96] * Fraleigh, J. B. (1998). A First Course in Abstract Algebra, 6th ed. Reading, Massachusetts: Addison-Wesley.

[97] Fujisaki, E., T. Kobayashi, H. Morita, H. Oguro, T. Okamoto, S. Okazaki, D. Pointcheval and S. Uchiyama (1999). “EPOC: Efficient Probabilistic Public-Key Encryption”, contribution to IEEE P1363a.

[98] Fujisaki, E., T. Okamoto, D. Pointcheval, J. Stern (2001). “RSA-OAEP is Secure under the RSA Assumption”, Advances in Cryptology—CRYPTO 2001, Lecture Notes in Computer Science, 2139. pp. 260–274. Berlin/Heidelberg: Springer.

[99] Fulton, W. (1969). Algebraic Curves. Mathematics Lecture Notes Series. New York: W. A. Benjamin.

[100] Galbraith, S. D. (2003). “Weil Descent of Jacobians”, Discrete Applied Mathematics, 128 (1): 165–180.

[101] Galbraith, S. D., F. Hess and N. P. Smart (2002). “Extending the GHS Weil Descent Attack”, Advances in Cryptology—EUROCRYPT 2002, Lecture Notes in Computer Science, 2332. pp. 29–44. Berlin/Heidelberg: Springer.

[102] Galbraith, S. D., W. Mao, and K. G. Paterson (2002). “RSA-based Undeniable Signatures for General Moduli”, Topics in Cryptology—CT-RSA 2002, Lecture Notes in Computer Science, 2271. pp. 200–217. Berlin/Heidelberg: Springer.

[103] Gathen, J. von zur and J. Gerhard (1999). Modern Computer Algebra. Cambridge: Cambridge University Press.

[104] Gathen, J. von zur and V. Shoup (1992). “Computing Frobenius Maps and Factoring Polynomials”, pp. 97–105. Proceedings of the 24th Annual ACM Symposium on Theory of Computing, Victoria, British Columbia, Canada.

[105] Gaudry, P. (2000). “An Algorithm for Solving the Discrete Log Problem on Hyperelliptic Curves”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 19–34. Berlin/Heidelberg: Springer.

[106] Gaudry, P. and R. Harley (2000). “Counting Points on Hyperelliptic Curves over Finite Fields”, Algorithmic Number Theory—ANTS-IV, Lecture Notes in Computer Science, 1838. pp. 313–332. Berlin/Heidelberg: Springer.

[107] Gaudry, P., F. Hess and N. P. Smart (2002). “Constructive and Destructive Facets of Weil Descent on Elliptic Curves”, Journal of Cryptology, 15 (1): 19–46.

[108] Geddes, K. O., S. R. Czapor and G. Labahn (1992). Algorithms for Computer Algebra. Boston: Kluwer Academic Publishers.

[109] Gennaro, R., H. Krawczyk and T. Rabin (2000). “RSA-based Undeniable Signatures”, Journal of Cryptology, 13 (4): 397–416.

[110] Gentry, C., J. Jonsson, M. Szydlo and J. Stern (2001). “Cryptanalysis of the NTRU Signature Scheme (NSS) from Eurocrypt 2001”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 1–20. Berlin/Heidelberg: Springer.

[111] Gentry, C. and M. Szydlo (2002). “Cryptanalysis of the NTRU Signature Scheme”, Advances in Cryptology—EUROCRYPT ’02, Lecture Notes in Computer Science, 2332. pp. 299–320. Berlin/Heidelberg: Springer.

[112] Gilbert, H. and M. Minier (2000). “A Collision Attack on Seven Rounds of Rijndael”, pp. 230–241. Proceedings of the 3rd AES Conference, NIST, New York, April 2000.

[113] * Goldreich, O. (2001). Foundations of Cryptography, Volume 1: Basic Tools. Cambridge: Cambridge University Press.

[114] * Goldreich, O. (2004). Foundations of Cryptography, Volume 2: Basic Applications. Cambridge: Cambridge University Press.

[115] Goldreich, O., S. Goldwasser and S. Halevi (1997). “Public-key Cryptosystems from Lattice Reduction Problems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 112–131. Berlin/Heidelberg: Springer.

[116] Goldwasser, S. and J. Kilian (1986). “Almost All Primes Can Be Quickly Certified”, pp. 316–329. Prodeedings of the 18th Annual ACM Symposium on Theory of Computing, Berkeley, California.

[117] Goldwasser, S. and S. Micali (1984). “Probabilistic Encryption”, Journal of Computer and Systems Sciences, 28: 270–299.

[118] Gordon, D. M. (1985). “Strong Primes are Easy to Find”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 216–223. Berlin/Heidelberg: Springer.

[119] Gordon, D. M. (1993). “Discrete Logarithms in GF (p) Using the Number Field Sieve”, SIAM Journal of Discrete Mathematics, 6: 124–138.

[120] Gordon, D. M. and K. S. McCurley (1992). “Massively Parallel Computation of Discrete Logarithms”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 312–323. Berlin/Heidelberg: Springer.

[121] Grinstead, C. M. and J. L. Snell (1997). Introduction to Probability, 2nd revised ed. Providence, Rhode Island: American Mathematical Society. The book is also available at http://www.dartmouth.edu/~chance/book.html (October 2008).

[122] Guillou, L. C. and J.-J. Quisquater (1988). “A Practical Zero-Knowledge Protocol Fitted to Security Microprocessor Minimizing Both Trasmission and Memory”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 123–128. Berlin/Heidelberg: Springer.

[123] Hankerson, D., A. J. Menezes and S. Vanstone (2004). Guide to Elliptic Curve Cryptography. New York: Springer.

[124] Hartshorne, R. (1977). Algebraic Geometry. Graduate Texts in Mathematics, 52. New York, Heidelberg and Berlin: Springer.

[125] * Herstein, I. N. (1975). Topics in Algebra. New York: John Wiley & Sons.

[126] Hess, F., G. Seroussi and N. P. Smart (2000). “Two Topics in Hyperelliptic Cryptography”. HP Labs technical report HPL-2000-118.

[127] * Hoffman, K. and R. Kunze (1971). Linear Algebra. Englewood Cliffs, New Jersey: Prentice-Hall.

[128] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2003). “NTRUSign: Digital Signatures Using the NTRU Lattice”, Topics in Cryptology—CT-RSA 2003, Lecture Notes in Computer Science, 2612. pp. 122–140. Berlin/Heidelberg: Springer.

[129] Hoffstein, J., N. Howgrave-Graham, J. Pipher, J. H. Silverman and W. White (2005). “Performance Improvements and a Baseline Parameter Generation Algorithm for NTRUSign”, Workshop on Mathematical Problems and Techniques in Cryptology, Barcelona, Spain, June 2005. Also available at http://www.ntru.com/cryptolab/articles.htm (October 2008).

[130] Hoffstein, J., J. Pipher and J. H. Silverman (1998). “NTRU: A Ring-Based Public Key Cryptosystem”, Algorithmic Number Theory—ANTS-III, Lecture Notes in Computer Science, 1423. pp. 267–288. Berlin/Heidelberg: Springer.

[131] Hoffstein, J., J. Pipher and J. H. Silverman (2001). “NSS: An NTRU Lattice-Based Signature Scheme”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 211–228. Berlin/Heidelberg: Springer.

[132] Horster, P., M. Michels and H. Petersen (1994). “Meta-ElGamal Signature Schemes”. Technical report TR-94-5-F, Department of Computer Science, Teschnische Universität, Chemnitz-Zwickau.

[133] * Hungerford, T. W. (1974). Algebra, 5th ed. Graduate Texts in Mathematics, 73. Berlin: Springer.

[134] IEEE (2008), “Standard Specifications for Public-Key Cryptography” [online document]. Available at http://grouper.ieee.org/groups/1363/index.html (October 2008).

[135] IETF (2008), “The Internet Engineering Task Force” [online document]. Available at http://www.ietf.org/ (October 2008).

[136] * Ireland, K. and M. Rosen (1990). A Classical Introduction to Modern Number Theory. Graduate Texts in Mathematics, 84. New York: Springer.

[137] Izu, T., B. Möller and T. Takagi (2002). “Improved Elliptic Curve Multiplication Methods Resistant Against Side Channel Attacks”, Progress in Cryptology—INDOCRYPT 2002, Lecture Notes in Computer Science, 2551. pp. 296–313. Berlin/Heidelberg: Springer.

[138] Izu, T. and T. Takagi (2002). “A Fast Parallel Elliptic Curve Multiplication Resistant Against Side Channel Attacks”, Public Key Cryptography—PKC 2002, Lecture Notes in Computer Science, 2274. pp. 280–296. Berlin/Heidelberg: Springer. An improved version of this paper is published as the technical report CORR 2002-03 of the Centre for Applied Cryptographic Research, University of Waterloo, Canada, and is available at http://www.cacr.math.uwaterloo.ca/ (October 2008).

[139] Jacobson, M. J., N. Koblitz, J. H. Silverman, A. Stein and E. Teske (2000). “Analysis of the Xedni Calculus Attack”, Design, Codes and Cryptography, 20: 41–64.

[140] Janusz, G. J. (1995). Algebraic Number Fields. Providence, Rhode Island: American Mathematical Society.

[141] Johnson, D. and A. Menezes (1999). “The Elliptic Curve Digitial Signature Algorithm (ECDSA)”. Technical report CORR 99-34, Department of Combinatorics and Optimization, University of Waterloo, Canada. Also published in International Journal on Information Security (2001), 1: 36–63.

[142] Joye, M., A. K. Lenstra and J.-J. Quisquater (1999). “Chinese Remaindering Based Cryptosystems in the Presence of Faults”, Journal of Cryptology, 12 (4): 241–246.

[143] Kaltofen, E. and V. Shoup (1995). “Subquadratic-Time Factoring of Polynomials over Finite Fields”, pp. 398–406. Proceedings of the 27th Annual ACM Symposium on Theory of Computing, Las Vegas, Nevada.

[144] Kampkötter, W. (1991). Explizite Gleichungen für Jacobishe Varietäten hyperelliptischer Kurven [dissertation]. Essen: Gesamthochschule.

[145] Katz, J. and Y. Lindell (2007). Introduction to Modern Cryptography. Boca Raton, Florida; London and New York: CRC Press.

[146] Kaye, P. and C. Zalka (2004), “Optimized Quantum Implementation of Elliptic Curve Arithmetic over Binary Fields” [online document]. Available at http://arxiv.org/abs/quant-ph/0407095 (October 2008).

[147] * Knuth, D. E. (1997). The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Reading, Massachusetts: Addison-Wesley.

[148] Ko, K. H., S. J. Lee, J. H. Cheon, J. W. Han, J. S. Kang and C. S. Park (2000). “New Public-Key Cryptosystem Using Braid Groups”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 166–183. Berlin/Heidelberg: Springer.

[149] Koblitz, N. (1984). p-adic Numbers, p-adic Analysis, and Zeta-Functions, 2nd ed. Graduate Texts in Mathematics, 58. New York, Heidelberg and Berlin: Springer.

[150] Koblitz, N. (1987). “Elliptic Curve Cryptosystems”, Mathematics of Computation, 48: 203–209.

[151] Koblitz, N. (1989). “Hyperelliptic Cryptosystems”, Journal of Cryptology, 1: 139–150.

[152] Koblitz, N. (1993). Introduction to Elliptic Curves and Modular Forms, 2nd ed. Graduate Texts in Mathematics, 97. Berlin: Springer.

[153] * Koblitz, N. (1994). A Course in Number Theory and Cryptography, 2nd ed. New York:Springer.

[154] Koblitz, N. (1998). Algebraic Aspects of Cryptography. New York: Springer.

[155] Kocher, P. C. (1996). “Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 104–113. Berlin/Heidelberg: Springer.

[156] Kocher, P. C., J. Jaffe and B. Jun (1999). “Differential Power Analysis”, Advances in Cryptology—CRYPTO ’99, Lecture Notes in Computer Science, 1666. pp. 388–397. Berlin/Heidelberg: Springer.

[157] Lagarias, J. C. and A. M. Odlyzko (1985). “Solving Low-Density Subset Sum Problems”, Journal of ACM, 32: 229–246.

[158] LaMacchia, B. A. and A. M. Odlyzko (1991a). “Computation of Discrete Logarithms in Prime Fields”, Designs, Codes and Cryptography, 1: 46–62.

[159] LaMacchia, B. A. and A. M. Odlyzko (1991b). “Solving Large Sparse Linear Systems over Finite Fields”, Advances in Cryptology—CRYPTO ’90, Lecture Notes in Computer Science, 537. pp. 109–133. Berlin/Heidelberg: Springer.

[160] Lang, S. (1994). Algebraic Number Theory. Graduate Texts in Mathematics, 110. New York: Springer.

[161] Law, L., A. Menezes, A. Qu, J. Solinas and S. Vanstone (1998). “An Efficient Protocol for Authenticated Key Agreement”. Technical report CORR 98-05, Department of Combinatorics and Optimization, University of Waterloo, Canada.

[162] Lehmer, D. H. and R. E. Powers (1931). “On Factoring Large Numbers”, Bulletin of the AMS, 37: 770–776.

[163] Lenstra, A. K., E. Tromer, A. Shamir, W. Kortsmit, B. Dodson, J. Hughes and P. Leyland (2003). “Factoring Estimates for a 1024-Bit RSA Modulus”, Advances in Cryptology—ASIACRYPT 2003, Lecture Notes in Computer Science, 2894. pp. 55–74. Berlin/Heidelberg: Springer.

[164] Lenstra, A. K. and H. W. Lenstra (1990). “Algorithms in Number Theory”, in J. van Leeuwen (ed.), Handbook of Theoretical Computer Science, Volume A, pp. 675–715, Amsterdam: Elsevier.

[165] Lenstra, A. K. and H. W. Lenstra (ed.) (1993). The Development of the Number Field Sieve. Lecture Notes in Mathematics, 1554. Berlin: Springer.

[166] Lenstra, A. K., H. W. Lenstra and L. Lovasz (1982). “Factoring Polynomials with Rational Coefficients”, Mathematische Annalen, 261: 515–534.

[167] Lenstra, A. K., H. W. Lenstra, M. S. Manasse and J. M. Pollard (1990). “The Number Field Sieve”, pp. 564–572. Proceedings of the 22nd Annual ACM Symposium on Theory of Computing, Baltimore, Maryland, USA, 13–17 May.

[168] Lenstra, A. K. and A. Shamir (2000). “Analysis and Optimization of the TWINKLE Factoring Device”, Advances in Cryptology—EUROCRYPT 2000, Lecture Notes in Computer Science, 1807. pp. 35–52. Berlin/Heidelberg: Springer.

[169] Lenstra, A. K., A. Shamir, J. Tomlinson and E. Tromer (2002). “Analysis of Bernstein’s Factorization Circuit”, Advances in Cryptology—ASIACRYPT 2002, Lecture Notes in Computer Science, 2501. pp. 1–26. Berlin/Heidelberg: Springer.

[170] Lenstra, A. K. and E. R. Verheul (2000a). “The XTR Public Key System”, Advances in Cryptology—CRYPTO 2000, Lecture Notes in Computer Science, 1880. pp. 1–20. Berlin/Heidelberg: Springer.

[171] Lenstra, A. K. and E. R. Verheul (2000b). “Key Improvements to XTR”, Advances in Cryptology—ASIACRYPT 2000, Lecture Notes in Computer Science, 1976. pp. 220–233. Berlin/Heidelberg: Springer.

[172] Lenstra, A. K. and E. R. Verheul (2001a). “An Overview of the XTR Public Key System”, pp. 151–180. Proceedings of the Public Key Cryptography and Computational Number Theory Conference, Warsaw, Poland, 2000. Berlin: Verlages Walter de Gruyter.

[173] Lenstra, A. K. and E. R. Verheul (2001b). “Fast Irreducibility and Subgroup Membership Testing in XTR”, Public Key Cryptography—PKC 2001, Lecture Notes in Computer Science, 1992. pp. 73–86. Berlin/Heidelberg: Springer.

[174] Lenstra, H. W. (1987). “Factoring Integers with Elliptic Curves”, Annals of Mathematics, 126: 649–673.

[175] Lenstra, H. W. and C. Pomerance (2005), “Primality Testing with Gaussian Periods” [online document]. Available at http://www.math.dartmouth.edu/~carlp/PDF/complexity12.pdf (October 2008).

[176] Lercier, R. (1997). “Finding Good Random Elliptic Curves for Cryptosystems Defined over “, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 379–392. Berlin/Heidelberg: Springer.

[177] Lercier, R. and D. Lubicz (2003). “Counting Points on Elliptic Curves over Finite Fields of Small Characteristic in Quasi Quadratic Time”, Advances in Cryptology—EUROCRYPT 2003, Lecture Notes in Computer Science, 2656. pp. 360–373. Berlin/Heidelberg: Springer.

[178] Libert, B. and J.-J. Quisquater (2003), “New Identity Based Signcryption Schemes from Pairings” [online document]. Available at http://eprint.iacr.org/2003/023/ (October 2008).

[179] Lidl, R. and H. Niederreiter (1984). Finite Fields, Encyclopedia of Mathematics and Its Applications, 20. Cambridge: Cambridge University Press.

[180] Lidl, R. and H. Niederreiter (1994). Introduction to Finite Fields and Their Applications. Cambridge: Cambridge University Press.

[181] Liu, D. and P. Ning (2003a). “Establishing Pairwise Keys in Distributed Sensor Networks”, pp. 52–61. Proceedings of the 10th ACM Conference on Computer and Communication Security, Washington D.C., USA, October 2003.

[182] Liu, D. and P. Ning (2003b). “Location-Based Pairwise Key Establishments for Static Sensor Networks”, pp. 72–82. Proceedings of the 1st ACM Workshop on Security in Ad Hoc and Sensor Networks, Fairfax, Virginia, 31 October 2003.

[183] Liu, D., P. Ning and R. Li (2005). “Establishing Pairwise Keys in Distributed Sensor Networks”, ACM Transactions on Information and System Security, (8) 1: 41–77.

[184] Lucks, S. (2000). “Attacking Seven Rounds of Rijndael Under 192-bit and 256-bit Keys”, pp. 215–229. Proceedings of the 3rd Advanced Encryption Standard Candidate conference, New York, April 2000.

[185] Malone-Lee, J. (2002), “Identity-Based Signcryption” [online document]. Available at http://eprint.iacr.org/2002/098/ (October 2008).

[186] Mao, W. (2001). “New Zero-Knowledge Undeniable Signatures—Forgery of Signature Equivalent to Factor-isation”. Hewlett-Packard technical report HPL-2201-36.

[187] Mao, W. and K. G. Paterson (2000). “Convertible Undeniable Standard RSA Signatures”. Hewlett-Packard technical report HPL-2000-148.

[188] Matsumoto, T. and H. Imai (1988). “Public Quadratic Polynomial-Tuples for Efficient Signature-Verification and Message-Encryption”, Advances in Cryptology—EUROCRYPT ’88, Lecture Notes in Computer Science, 330. pp. 419–453. Berlin/Heidelberg: Springer.

[189] McCurley, K. S. (1990). “The Discrete Logarithm Problem”, in C. Pomerance and S. Goldwasser (eds.), Cryptology and Computational Number Theory: American Mathematical Society Short Course, Boulder, Colorado, 6–7 August 1989. Proceedings of Symposia in Applied Mathematics, 42. pp. 49–74. Providence, Rhode Island: American Mathematical Society.

[190] McEliece, R. J. (1978). “A Public-Key Cryptosystem Based on Algebraic Coding Theory”. DSN progress report 42–44, Jet Propulsion Laboratory, California Institute of Technology, pp. 114–116.

[191] Menezes, A. J. (ed.) (1993). Applications of Finite Fields. Boston: Kluwer Academic Publishers.

[192] Menezes, A. J. (1993). Elliptic Curve Public Key Cryptosystems. The Springer International Series in Engineering and Computer Science, 234. Springer. Available at http://books.google.co.in/books?id=bIb54ShKS68C (October 2008).

[193] Menezes, A. J., T. Okamoto and S. Vanstone (1993). “Reducing Elliptic Curve Logarithms to a Finite Field”, IEEE Transactions on Information Theory, 39: 1639–1646.

[194] Menezes, A. J., P. van Oorschot and S. Vanstone (1997). Handbook of Applied Cryptography. Boca Raton, Florida: CRC Press.

[195] Menezes, A. J., Y. Wu and R. Zuccherato (1996). “An Elementary Introduction to Hyperelliptic Curves”. CACR technical report CORR 96-19, University of Waterloo, Canada.

[196] Merkle, R. C. amd M. E. Hellman (1978). “Hiding Information and Signatures in Trapdoor Knapsacks”, IEEE Transactions on Information Theory, 24 (5): 525–530.

[197] Mermin, N. D. (2003). “From Cbits to Qbits: Teaching Computer Scientists Quantum Mechanics”, American Journal of Physics, 71: 23–30.

[198] Mermin, N. D. (2006), “Phys481-681-CS483 Lecture Notes and Homework Assignments” [online document]. Available at http://people.ccmr.cornell.edu/~mermin/qcomp/CS483.html (October 2008).

[199] Messerges, T. S. (2000). “Securing the AES Finalists Against Power Analysis Attacks”, Fast Software Encryption—FSE 2000, Lecture Notes in Computer Science, 1978. pp. 150–164. Berlin/Heidelberg: Springer.

[200] Messerges, T. S., E. A. Dabbish and R. H. Sloan (1999). “Power Analysis Attacks of Modular Exponentiation in Smartcards”, Cryptographic Hardware and Embedded Systems—CHES 1999, Lecture Notes in Computer Science, 1717. pp. 144–157. Berlin/Heidelberg: Springer.

[201] Messerges, T. S., E. A. Dabbish and R. H. Sloan (2002). “Examining Smart-Card Security Under the Threat of Power Analysis Attacks”, IEEE Transactions on Computers, 51 (4): 541–552.

[202] Michels, M. and M. Stadler (1997). “Efficient Convertible Undeniable Signature Schemes”, pp. 231–244. Proceedings of the 4th International Workshop on Selected Areas in Cryptography, Ottawa, Canada.

[203] Mignotte, M. (1992). Mathematics for Computer Algebra. New York: Springer.

[204] Miller, G. L. (1976). “Riemann’s Hypothesis and Tests for Primality”, Journal of Computer and System Sciences, 13: 300–317.

[205] Miller, V. (1986). “Uses of Elliptic Curves in Cryptography”, Advances in Cryptology—CRYPTO ’85, Lecture Notes in Computer Science, 18. pp. 417–426. Berlin/Heidelberg: Springer.

[206] Möller, B. (2001). “Securing Elliptic Curve Point Multiplication Against Side-Channel Attacks”, Information Security Conference, Lecture Notes in Computer Science, 2200. pp. 324–334. Berlin/Heidelberg: Springer.

[207] Mollin, R. A. (1998). Fundamental Number Theory with Applications. Boca Raton, Florida: Chapman & Hall/CRC.

[208] Mollin, R. A. (1999). Algebraic Number Theory. Boca Raton, Florida: Chapman & Hall/CRC.

[209] Mollin, R. A. (2001). An Introduction to Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.

[210] Montgomery, P. L. (1985). “Modular Multiplication Without Trial Division”, Mathematics of Computation, 44: 519–521.

[211] Montgomery, P. L. (1994). “A Survey of Modern Integer Factorization Algorithms”, CWI Quarterly, 7 (4): 337–366.

[212] Montgomery, P. L. (1995). “A Block Lanczos Algorithm for Finding Dependencies over GF(2)”, Advances in Cryptology—EUROCRYPT ’95, Lecture Notes in Computer Science, 921. pp. 106–120. Berlin/Heidelberg: Springer.

[213] Morrison, M. A. and J. Brillhart (1975). “A Method of Factoring and a Factorization of F7”, Mathematics of Computation, 29: 183–205.

[214] * Motwani, R. and P. Raghavan (1995). Randomized Algorithms. Cambridge: Cambridge University Press.

[215] Muir, J. A. (2001). Techniques of Side Channel Cryptanalysis [dissertation]. Canada: University of Waterloo. Available at http://www.uwspace.uwaterloo.ca/bitstream/10012/1098/1/jamuir2001.pdf (October 2008).

[216] Neukirch, J. (1999). Algebraic Number Theory. Berlin and Heidelberg: Springer.

[217] Nguyen, P. Q. (2006), “A Note on the Security of NTRUSign” [online document]. Available at http://eprint.iacr.org/2006/387 (October 2008).

[218] * Nielsen, M. A. and I. L. Chuang (2000). Quantum Computation and Quantum Information. Cambridge: Cambridge University Press.

[219] NIST (2001), “Advanced Encryption Standard” [online document]. Available at http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf (October 2008).

[220] NIST (2006), “Digital Signature Standard (DSS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_186-3/Draft-FIPS-186-3%20_March2006.pdf (October 2008).

[221] NIST (2007a), “Federal Information Processing Standards” [online document]. Available at http://csrc.nist.gov/publications/PubsFIPS.html (October 2008).

[222] NIST (2007b), “Secure Hash Standard (SHS)” [online document]. Available at http://csrc.nist.gov/publications/drafts/fips_180-3/draft_fips-180-3_June-08-2007.pdf (October 2008).

[223] Nyberg, K. and R. A. Rueppel (1993). “A New Signature Scheme Based on the DSA Giving Message Recovery”, pp. 58–61. Proceedings of the 1st ACM Conference on Computer and Communications Security, Fairfax, Virginia, 3–5 November.

[224] Nyberg, K. and R. A. Rueppel (1995). “Message Recovery for Signature Schemes Based on the Discrete Logarithm Problem”, Advances in Cryptology—EUROCRYPT ’94, Lecture Notes in Computer Science, 950. pp. 182–193. Berlin/Heidelberg: Springer.

[225] Odlyzko, A. M. (1985). “Discrete Logarithms and Their Cryptographic Significance”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 224–314. Berlin/Heidelberg: Springer.

[226] Odlyzko, A. M. (2000). “Discrete Logarithms: The Past and the Future”, Designs, Codes and Cryptography, 19: 129–145.

[227] Okamoto, T. (1992). “Provably Secure and Practical Identification Schemes and Corresponding Signature Schemes”, Advances in Cryptology—CRYPTO ’92, Lecture Notes in Computer Science, 740. pp. 31–53. Berlin/Heidelberg: Springer.

[228] Okamoto, T., E. Fujisaki and H. Morita (1998). “TSH-ESIGN: Efficient Digital Signature Scheme Using Trisection Size Hash”, submission to IEEE P1363a.

[229] Papadimitriou, C. H. (1994). Computational Complexity. Reading, Massachusetts: Addison-Wesley.

[230] Park, S., T. Kim, Y. An and D. Won (1995). “A Provably Entrusted Undeniable Signature”, pp. 644–648. IEEE Singapore International Conference on Network/International Conference on Information Engineering (SICON/ICIE ’95).

[231] Patarin, J. (1995). “Cryptanalysis of the Matsumoto and Imai Public Key Scheme of Eurocrypt’88”, Advances in Cryptology—CRYPTO ’95, Lecture Notes in Computer Science, 963. pp. 248–261. Berlin/Heidelberg: Springer.

[232] Patarin, J. (1996). “Hidden Fields Equations (HFE) and Isomorphisms of Polynomials (IP): Two New Families of Asymmetric Algorithms”, Advances in Cryptology—EUROCRYPT ’96, Lecture Notes in Computer Science, 1070. pp. 33–48. Berlin/Heidelberg: Springer.

[233] Pirsig, R. M. (1974). Zen and the Art of Motorcycle Maintenance: An Inquiry into Values. London: Bodley Head.

[234] Pohlig, S. and M. Hellman (1978). “An Improved Algorithm for Computing Logarithms over GF (p) and its Cryptographic Significance”, IEEE Transactions on Information Theory, 24: 106–110.

[235] Pohst, M. and H. Zassenhaus (1989). Algorithmic Algebraic Number Theory, Encyclopaedia of Mathematics and Its Applications, 30. Cambridge: Cambridge University Press.

[236] Pointcheval, D. and J. Stern (1996). “Provably Secure Blind Signature Schemes”, Advances in Cryptology—ASIACRYPT ’96, Lecture Notes in Computer Science, 1163. pp. 252–265. Berlin/Heidelberg: Springer.

[237] Pointcheval, D. and J. Stern (2000). “Security Arguments for Digital Signatures and Blind Signatures”, Journal of Cryptology, 13 (3): 361–396.

[238] Pollard, J. M. (1974). “Theorems on Factorization and Primality Testing”, Proceedings of the Cambridge Philosophical Society, 76 (2): 521–528.

[239] Pollard, J. M. (1975). “A Monte Carlo Method for Factorization”, BIT, 15 (3): 331–334.

[240] Pollard, J. M. (1993). “Factoring with Cubic Integers”, in A. K. Lenstra and H. W. Lenstra (eds.), The Development of the Number Field Sieve, Lecture Notes in Mathematics, 1554. pp. 4–10. Berlin: Springer.

[241] Pomerance, C. (1985). “The Quadratic Sieve Factoring Algorithm”, Advances in Cryptology—EUROCRYPT ’84, Lecture Notes in Computer Science, 209. pp. 169–182. Berlin/Heidelberg: Springer.

[242] Pomerance, C. (2008). “Elementary Thoughts on Discrete Logarithms”, pp. 385–396. in J. P. Buhler and P. Stevenhagen (eds.), Surveys in Algorithmic Number Theory, Publications of the Research Institute for Mathematical Sciences, 44. New York: Cambridge University Press.

[243] Preskill, J. (1998). “Quantum Computing: Pro and Con”, Proceedings of the Royal Society of London, A454:469–486.

[244] Preskill, J. (2007), “Course Information for Quantum Computation” [online document]. Available at http://theory.caltech.edu/people/preskill/ph219/ (October 2008).

[245] Proos, J. and C. Zalka (2004), “Shor’s Discrete Logarithm Quantum Algorithm for Elliptic Curves” [online document]. Available at http://arxiv.org/abs/quant-ph/0301141 (October 2008).

[246] Rabin, M. O. (1979). “Digitalized Signatures and Public-Key Functions as Intractable as Factorization”. Technical report MIT/LCS/TR-212, MIT Laboratory for Computer Science, Massachusetts.

[247] Rabin, M. O. (1980a). “Probabilistic Algorithms in Finite Fields”, SIAM Journal of Computing, 9: 273–280.

[248] Rabin, M. O. (1980b). “Probabilistic Algorithm for Testing Primality”, Journal of Number Theory, 12: 128–138.

[249] Ram Murty, M. (2001). Problems in Analytic Number Theory. New York: Springer.

[250] Raymond, J.-F. and A. Stiglic (2000), “Security Issues in the Diffie-Hellman Key Agreement Protocol” [online document]. Available at http://crypto.cs.mcgill.ca/~stiglic/Papers/dhfull.pdf (October 2008).

[251] Ribenboim, P. (2001). Classical Theory of Algebraic Numbers. Universitext. New York: Springer.

[252] Rivest, R. L., A. Shamir, and L. M. Adleman (1978). “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems”, Communications of the ACM, 2: 120–126.

[253] Rosser, J. and J. Schoenfield (1962). “Approximate Formulas for Some Functions of Prime Numbers”, Illinois Journal of Mathematics, 6: 64–94.

[254] RSA Security Inc. (2008), “Public-Key Cryptography Standards” [online document]. Available at http://www.rsa.com/rsalabs/node.asp?id=2124 (October 2008).

[255] Sakurai, J. J. (1994). Modern Quantum Mechanics. Revised by San-Fu Tuan, Reading, Massachusetts: Addison-Wesley.

[256] Satoh, T. (2000). “The Canonical Lift of an Ordinary Elliptic Curve over a Finite Field and Its Point Counting”, Journal of Ramanujan Mathematical Society, 15: 247–270.

[257] Satoh, T. and K. Araki (1998). “Fermat Quotients and the Polynomial Time Discrete Log Algorithm for Anomalous Elliptic Curves”, Commentarii Mathematici Universitatis Sancti Pauli, 47: 81–92.

[258] Schiff, L. I. (1968). Quantum Mechanics, 3rd ed. New York: McGraw-Hill.

[259] Schindler, W., F. Koeune and J.-J. Quisquater (2001). “Unleashing the Full Power of Timing Attack”. Technical report CG-2001/3, Université Catholique de Louvain, Belgium. Available at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.23.6622.

[260] Schirokauer, O. (1993). “Discrete Logarithms and Local Units”, Philosophical Transactions of the Royal Society of London, Series A, 345: 409–423.

[261] Schirokauer, O., D. Weber, and T. Denny (1996). “Discrete Logarithms: The Effectiveness of the Index Calculus Method”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.

[262] * Schneier, B. (2006). Applied Cryptography, 2nd ed. New York: John Wiley & Sons.

[263] Schnorr, C. P. (1991). “Efficient Signature Generation for Smart Cards”, Journal of Cryptology, 4: 161–174.

[264] Schoof, R. (1995). “Counting Points on Elliptic Curves over Finite Fields”, Journal de Théorie des Nombres de Bourdeaux, 7: 219-254.

[265] Semaev, I. A. (1998). “Evaluation of Discrete Logarithms on Some Elliptic Curves”, Mathematics of Computation, 67: 353–356.

[266] Shamir, A. (1984). “A Polynomial-Time Algorithm for Breaking the Basic Merkle-Hellman Cryptosystem”, IEEE Transactions on Information Theory, 30: 699–704.

[267] Shamir, A. (1984). “Identity-Based Cryptosystems and Signature Schemes”, Advances in Cryptology—CRYPTO ’84, Lecture Notes in Computer Science, 196. pp. 47–53. Berlin/Heidelberg: Springer.

[268] Shamir, A. (1997). “How to Check Modular Exponentiation”, presented at the rump session of Advances in Cryptology—EUROCRYPT ’97, May.

[269] Shamir, A. (1999). “Factoring Large Numbers with the TWINKLE Device”, Cryptographic Hardware and Embedded Systems—CHES ’99, Lecture Notes in Computer Science, 1717. pp. 2–12. Berlin/Heidelberg: Springer.

[270] Shamir, A. and E. Tromer (2003). “Factoring Large Numbers with the TWIRL Device”, Advances in Cryptology—CRYPTO 2003, Lecture Notes in Computer Science, 2729. pp. 1–26. Berlin/Heidelberg: Springer.

[271] Shor, P. W. (1997). “Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer”, SIAM Journal of Computing, 26: 1484–1509.

[272] Shoup, V. (1990). “On the Deterministic Complexity of Factoring Polynomials over Finite Fields”, Information Processing Letters, 33: 261–267.

[273] Shparlinski, I. E. (1991). “On Some Problems in the Theory of Finite Fields”, Russian Mathematical Surveys, 46 (1): 199–240.

[274] Shparlinski, I. E. (1992). Computational and Algorithmic Problems in Finite Fields, Mathematics and its Applications, 88. Kluwer Academic Publishers.

[275] * Silverman, J. H. (1986). The Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 106. Berlin and New York: Springer.

[276] Silverman, J. H. (1994). Advanced Topics in the Arithmetic of Elliptic Curves. Graduate Texts in Mathematics, 151. New York: Springer.

[277] Silverman, J. H. (2000). “The Xedni Calculus and the Elliptic Curve Discrete Logarithm Problem”, Design, Codes and Cryptography, 20: 5–40.

[278] Silverman, J. H. and J. Suzuki (1998). “Elliptic Curve Discrete Logarithms and the Index Calculus”, Advances in Cryptology—ASIACRYPT ’98, Lecture Notes in Computer Science, 1514. pp. 110–125. Berlin/Heidelberg: Springer.

[279] Silverman, R. D. (1987). “The Multiple Polynomial Quadratic Sieve”, Mathematics of Computation, 48: 329–339.

[280] * Sipser, M. (1997). Introduction to the Theory of Computation, 2nd ed. Boston: PWS Publishing Company.

[281] B. Skjernaa (2003). “Satoh’s Algorithm in Characteristic 2”, Mathematics of Computation, 72: 477–487.

[282] Smart, N. P. (1999). “The Discrete Logarithm Problem on Elliptic Curves of Trace One”, Journal of Cryptology, 12: 193–196.

[283] Smart, N. P. (2002). Cryptography: An Introduction. New York: McGraw-Hill. The 2nd edition of this book is available online at http://www.cs.bris.ac.uk/~nigel/Crypto_Book/ (October 2008).

[284] Smith, P. J. (1993). “LUC Public-Key Encryption: A Secure Alternative to RSA”, Dr. Dobb’s Journal, 18 (1): 44–49.

[285] Smith, P. J. and M. J. J. Lennon (1993). “LUC: A New Public Key System”, IFIP Transactions, A 37. pp. 103–117. Proceedings of the IFIP TC11, 9th International Conference on Information Security. Computer Security. Amsterdam: North-Holland Co.

[286] Smith, P. J. and C. Skinner (1995). “A Public-Key Cryptosystem and Digital Signature System Based on the Lucas Function Analogue to Discrete Logarithms”, Advances in Cryptology—ASIACRYPT ’94, Lecture Notes in Computer Science, 917. pp. 357–364. Berlin/Heidelberg: Springer.

[287] Solovay, R. and V. Strassen (1977). “A Fast Monte Carlo Test for Primality”, SIAM Journal of Computing, 6: 84–86.

[288] * Stallings, W. (2006). Cryptography and Network Security, 4th ed. Upper Saddle River, New Jersey: Prentice-Hall.

[289] Stam, M. and A. K. Lenstra (2001). “Speeding up XTR”, Advances in Cryptology—ASIACRYPT 2001, Lecture Notes in Computer Science, 2248. pp. 125–143. Berlin/Heidelberg: Springer.

[290] Stein, A. and E. Teske (2005). “Optimized Baby Step-Giant Step Methods”, Journal of Ramanujan Mathematical Society, 20 (1): 27–58.

[291] * Stinson, D. (2005). Cryptography: Theory and Practice, 3rd ed. Boca Raton, Florida: CRC Press.

[292] Strassen, V. (1969). “Gaussian Elimination Is not Optimal”, Numerische Mathematik, 13: 354–356.

[293] Stucki, D., N. Gisin, O. Guinnard, G. Ribordy and H. Zbinden (2002). “Quantum Key Distribution over 67 km with a Plug & Play System”, New Journal of Physics, 4: 41.1–41.8.

[294] Sun, H.-M., W.-C. Yang and C.-S. Laih (1999). “On the Design of RSA with Short Secret Exponent”, Advances in Cryptology—ASIACRYPT ’99, Lecture Notes in Computer Science, 1716. pp. 150–164. Berlin/Heidelberg: Springer.

[295] Swade, D. (2000). The Cogwheel Brain: Charles Babbage and the Quest to Build the First Computer. London: Little, Brown and Company.

[296] Trappe, W. and L. C. Washington (2006). Introduction to Cryptography with Coding Theory, 2nd ed. Upper Saddle River: Prentice-Hall.

[297] Verheul, E. R. (2001). “Evidence that XTR is More Secure than Supersingular Elliptic Curve Cryptosystems”, Advances in Cryptology—EUROCRYPT 2001, Lecture Notes in Computer Science, 2045. pp. 195–210. Berlin/Heidelberg: Springer.

[298] Washington, L. C. (2003). Elliptic Curves: Number Theory and Cryptography. Boca Raton, Florida: Chapman & Hall/CRC.

[299] Weber, D. (1996). “Computing Discrete Logarithms with the General Number Field Sieve”, Algorithmic Number Theory—ANTS-II, Lecture Notes in Computer Science, 1122. pp. 337–361. Berlin/Heidelberg: Springer.

[300] Weber, D. (1998). “Computing Discrete Logarithms with Quadratic Number Rings”, Advances in Cryptology—EUROCRYPT ’98, Lecture Notes in Computer Science, 1403. pp. 171–183. Berlin/Heidelberg: Springer.

[301] Weber, D. and T. Denny (1998). “The Solution of McCurley’s Discrete Log Challenge”, Advances in Cryptology—CRYPTO ’98, Lecture Notes in Computer Science, 1462. pp. 458–471. Berlin/Heidelberg: Springer.

[302] Western, A. E. and J. C. P. Miller (1968). “Tables of Indices and Primitive Roots”, Royal Mathematical Tables, 9, Cambridge: Cambridge University Press.

[303] Wiedemann, D. H. (1986). “Solving Sparse Linear Equations over Finite Fields”, IEEE Transactions on Information Theory, 32: 54–62.

[304] Wiener, M. J. (1990). “Cryptanalysis of Short RSA Secret Exponents”, IEEE Transactions on Information Theory, 36: 553–558.

[305] Williams, H. C. (1982). “A p + 1 Method for Factoring”, Mathematics of Computation, 39 (159): 225–234.

[306] Yang, L. T. and R. P. Brent (2001). “The Parallel Improved Lanczos Method for Integer Factorization over Finite Fields for Public Key Cryptosystems”, pp. 106–114. Proceedings of the ICPP Workshops 2001, Valencia, Spain, 3–7 September.

[307] Young, A. and M. Yung (1996). “The Dark Side of “Black-Box” Cryptography, or: Should We Trust Capstone?”, Advances in Cryptology—CRYPTO ’96, Lecture Notes in Computer Science, 1109. pp. 89–103. Berlin/Heidelberg: Springer.

[308] Young, A. and M. Yung (1997a). “Kleptography: Using Cryptography Against Cryptography”, Advances in Cryptology—EUROCRYPT ’97, Lecture Notes in Computer Science, 1233. pp. 62–74. Berlin/Heidelberg: Springer.

[309] Young, A. and M. Yung (1997b). “The Prevalence of Kleptographic Attacks on Discrete-Log Based Cryptosystems”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 264–276. Berlin/Heidelberg: Springer.

[310] Zheng, Y. (1997). “Digital Signcryption or How to Achieve Cost(Signature & Encryption) << Cost(Signature) + Cost(Encryption)”, Advances in Cryptology—CRYPTO ’97, Lecture Notes in Computer Science, 1294. pp. 165–179. Berlin/Heidelberg: Springer.

[311] Zheng, Y. (1998a). “Signcryption and Its Applications in Efficient Public Key Solutions”, 1997 Information Security Workshop ISW ’97, Lecture Notes in Computer Science, 1397. pp. 291–312. Berlin/Heidelberg: Springer.

[312] Zheng, Y. (1998b). “Shortened Digital Signature, Signcryption, and Compact and Unforgeable Key Agreement Schemes”, contribution to IEEE P1363 Standard for Public Key Cryptography.

[313] Zheng, Y. and H. Imai (1998a). “Efficient Signcryption Schemes on Elliptic Curves”. Proceedings of the IFIP 14th International Information Security Conference IFIP/SEC ’98, Vienna, Austria, September 1998. Chapman & Hall.

[314] Zheng, Y. and H. Imai (1998b). “How to Construct Efficient Signcryption Schemes on Elliptic Curves”, Information Processing Letters, 68: 227–233.

[315] Zheng, Y. and T. Matsumoto (1996). “Breaking Smartcard Implementations of ElGamal Signatures and Its Variants”, presented at the rump session of Advances in Cryptology—ASIACRYPT ’96. Available at http://www.sis.uncc.edu/~yzheng/publications/ (October 2008).

[316] * Zuckerman, H. S., H. L. Montgomery, I. M. Niven and A. Niven (1991). An Introduction to the Theory of Numbers. New York: John Wiley & Sons.

Books marked by stars have Asian editions (at the time of writing this book).

Index