Image 111

Japanese

Image 5

text processing 29

japhy246

Java 95-96, 365-403

(see also java.util.regex)

after-match data 138

anchoring bounds 388

benchmarking 235-236

BLTN 236

bugs 365, 368-369, 387, 392, 399, 403

code example 81, 209, 217, 235, 371, 375, 378-379, 381-384, 389

CSV parsing example 401

description 365

dot modes 111, 370

doubled-word example 81

JIT 236

line anchors 130, 370, 388

line terminators 370

match modes 368

match pointer 374, 383, 398, 400

matching comments 272-276

method chaining 389

method index 366

Mustang 401

object model 371-372

\p{···} 125

regex flavor 366-370

region 384-389

search and replace 378-383

Σ 110

split 395-396

strings 102

transparent bounds 387

Unicode 369

URL example 209

version covered 365

version history 365, 368-369, 392, 401

VM 236

word boundaries 134

java properties 369

\p{javaJavaIdentifierStart} 369

java.lang.Character 369

java.util.regex (see Java)

java.util.Scanner 390

Jeffs example 61-64

JfriedlsRegexLibrary 434-435

JIT

Java 236

.NET 410

JRE 236

\p{Katakana} 122, 124

keeping in sync 210-211

Keisler, H.J. 85

Kleene, Stephen 85

TheKleene Symposium 85

\kname (see named capture)

Korean text processing 29

Kunen, K. 85

£ 124

\l 290

\p{L&} 122-123, 125, 442

Java 369

Perl 288

\p{L} 121-122, 133, 368, 395

language (see also: .NET; C#; Java; MySQL; Perl; procmail; Python; Ruby; Tcl; VB.NET)

character class 10, 13

identifiers 24

\p{Latin} 122

Latin-1 encoding 29, 87, 106, 108, 123

lazy 166-167

(see also greedy)

essence 159, 168-169

favors match 167-168

vs. greedy 169, 256-257

optimization 248, 257

quantifier 141

lazy evaluation 181, 355

\L···\E 290

inhibiting 292

lc 290

lcfirst 290

leftmost match 177-179

Length

Group object method 430

Match object method 429

length-cognizance optimization 245, 247

\p{Letter} 122, 288

\p{Letter_Number} 123

$LevelN 330, 343

lex 86

$ 112

dot 111

history 87

and trailing context 182

lexer 132, 389, 399

building 315

lexical scope 299

LF 109, 370

LIFO backtracking 159

limit

backtracking 239

preg_split 466-467

recursion 249-250

line (see also string)

anchor optimization 246

vs. string 55

line anchor 112-113

mechanics of matching 150

variety of implementations 87

line anchors

Java 130, 370, 388

.NET 130

Perl 130

PHP 130

line feed 109, 370

LINE SEPARATOR 109, 123, 370

line terminators 109-111, 129-130, 370

with $ and ^ 112

Java 370

\p{Line_Separator} 123

link

matching 201

(see also URL examples)

Java 209

VB.NET 204

list context 294, 310-311

forcing 310

literal string initial string discrimination 245-248, 252, 257-259, 332, 361

literal text

exposing 255

introduced 5

mechanics of matching 149

pre-check optimization 245-248, 252, 257-259, 332, 361

literal-text mode 113, 136, 290

inhibiting 292

.NET 136

\p{Ll} 123, 406

\p{Lm} 123, 406

\p{Lo} 123, 406

local 296, 341

in embedded code 336

vs. my 297

locale 127-128

overview 87

\w 120-121

localizing 296-297

localtime 294, 319, 351

lockup (see neverending match)

locking in regex literal 352

A logical calculus of the ideas imminent in nervous activity85

longest match finding 334-335

longest-leftmost match 148, 177-179

lookahead 133

(see also lookaround)

auto 410

introduced 60

mimic atomic grouping 174

mimic optimizations 258-259

negated

<B>···</B> 167

positive vs. negative 66

lookahead example 61-64

lookaround

backtracking 173-174

conditional 140-141

and DFAs 182

doesn’t consume text 60

introduced 59

mimicking class set operations 126

mimicking word boundaries 134

Perl 288

lookbehind 133

(see also lookaround)

Java 368

.NET 408

Perl 288

PHP 134, 443

positive vs. negative 66

unlimited 408

lookingAt method 376

loose matching (see case-insensitive mode)

Lord, Tom 183

\p{Lowercase_Letter} 123

LS 109, 123, 370

\p{Lt} 123, 406

\p{Lu} 123, 406

Lunde, Ken xxiv, 29

\p{M} 120, 122

m/···/ introduced 38

(?m) (see: enhanced line-anchor mode; mode modifier)

/m 135

(see also: enhanced line-anchor mode; mode modifier)

machine-dependent character codes 115

MacOS 115

mail processing example 53-59

makudonarudo example 165, 169, 228-232, 264

\p{Mark} 122

match 306-318

(see also: DFA; NFA)

actions 95

context 294-295, 309

list 294, 310-311

scalar 294, 310, 312-316

DFA vs. NFA 224

efficiency 179

example with backtracking 160

example without backtracking 160

lazy example 161

leftmost-longest 335

longest 334-335

m/···/

introduced 38

mechanics (see also: greedy; lazy)

.* 152

anchors 150

capturing parentheses 149

character classes and dot 149

consequences 156

greedy introduced 151

literal text 149

modes 110-113

Java 368

negating 309

neverending 222-228, 330, 340

avoiding 264-266

discovery 226-228

explanation 226-228

non-determinism 264

short-circuiting 250

solving with atomic grouping 268

solving with possessive quantifiers 268

NFA vs. DFA 156-157, 180-183

of nothing 454

position (see pos)

POSIX

Perl 335

shortest-leftmost 182

side effects 317

intertwined 43

Perl 40

speed 181

in a string 27

tag-team 132

viewing mechanics 331-332

Match Empty 433

match modes Java 368

Match (.NET) Success 96

Match object (.NET) 417

Capture 437

creating 421, 429

Groups 429

Index 429

Length 429

NextMatch 429

Result 429

Success 427

Synchronized 430

ToString 427

using 427

Value 427

match pointer Java 374, 383, 398, 400

Match (Regex object method) 421

match rejected by optimizer363

match results Java 376

MatchCollection 422

Matcher

appendReplacement 380

appendTail 381

end 377

find 375

group 377

groupCount 377

hasAnchoringBounds 388

hasTransparentBounds 387

hitEnd 389-392

lookingAt 376

matches 376

pattern 393

quoteReplacement 379

region 384-389

region 386

regionEnd 386

regionStart 386

replaceAll 378

replaceFirst 379

replacement argument 380

requireEnd 389-392

reset 392-393

start 377

text 394

toMatchResult 377

toString 393

useAnchoringBounds 388

usePattern 393, 399

useTransparentBounds 387

Matcher object 373

reusing 392-393

$matches 450

vs. $all_matches 454

matches

unexpected 194-195

viewing all 332

matches method 376, 395

Matches (Regex object method) 422

MatchEvaluator 423-424

matching

delimited text 196-198

HTML tag 200

longest-leftmost 177-179

matching comments Java 272-276

MatchObject object (.NET) creating 422

\p{Math_Symbol} 123

Maton, William xxiv, 36

mb_ereg suite 439

MBOL 362

\p{Mc} 123

McCloskey, Mike xxiv

McCulloch, Warren 85

\p{Me} 123

mechanics viewing 331-332

metacharacter

conflicting 44-46

differing contexts 10

first-class 87, 92

introduced 5

vs. metasequence 27

metasequence defined 27

method chaining 389

Java 389

method index Java 366

mimic

$357

$` 357

$& 302, 357

atomic grouping 174

class set operations 126

conditional with lookaround 140

initial-character discrimination optimization 258-259

named capture 344-345

POSIX matching 335

possessive quantifiers 343-344

variable interpolation 321

word boundaries 66, 134, 341-342

minlen length 362

minus in character class 9

MISL .NET 410

“missing” functions PHP 471

\p{Mn} 123

mode modifier 110, 135-136

mode-modified span 110, 135-136, 367, 392, 407, 446

modes introduced with egrep 14-15

\p{Modifier_Letter} 123

modifiers 372

(see also match, modes)

combining 69

example with five 316

/g 51

/i 47

“locking in” 304-305

notation 99

/osmosis 293

Perl 292-293

Perl core 292-293

with regex object 304-305

unknown 448

\p{Modifier_Symbol} 123

Morse, Ian xxiv

motto Perl 348

-Mre=debug (see use re 'debug’)

multi-character quotes 165-166

Multiline (.NET) 408, 419-420, 427

multiple-byte character encoding 29

MungeRegexLiteral 342-344, 346

Mustang Java 401

my

binding 339

in embedded code 338-339

vs. local 297

MySQL

after-match data 138

DBIx::DWIW 258

version covered 91

word boundaries 134

\p{N} 122, 395

\n 49, 115-116

introduced 44

machine-dependency 115

$^N 300-301, 344-346

(?n) 408

named capture 138

mimicking 344-345

.NET 408-409

numeric names 451

PHP 450-452, 457, 476-477

with unnamed capture 409

naughty variables 356

OK for debugging 331

\p{Nd} 123, 368, 406

negated class

introduced 10-11

and lazy quantifiers 167

Tcl 112

negative lookahead (see lookahead, negative)

negative lookbehind (see lookbehind, negative)

NEL 109, 370, 407

nervous system 85

nested constructs

.NET 436

Perl 328-331, 340-341

PHP 475-478, 481

$NestedStuffRegex 339, 346

.NET xvii, 405-438

$+ 202

after-match data 138

benchmarking 237

character-class subtraction 406

code example 219

flavor overview 92

JIT 410

line anchors 130

literal-text mode 136

MISL 410

object model 417

\p{···} 125

regex approach 96-97

regex flavor 407

search and replace 414, 423-424

URL example 204

version covered 405

word boundaries 134

(see also VB.NET)

neurophysiologists early regex study 85

neverending match 222-228, 330, 340

avoiding 264-266

discovery 226-228

explanation 226-228

non-determinism 264

short-circuiting 250

solving with atomic grouping 268

solving with possessive quantifiers 268

New Regex 96, 99, 416, 421

newline and HTTP 115

NEXT LINE 109, 370, 407

NextMatch (Match object method) 429

NFA

acronym spelled out 156

and alternation 174-175

compared with DFA 156-157, 180-183

control benefits 155

efficiency 179

essence (see backtracking)

first introduced 145

freeflowing regex 277-281

and greediness 162

implementation ease 183

introduction 153

nondeterminism 265

checkpoint 264-265

POSIX efficiency 179

testing for 146-147

theory 180

\p{Nl} 123

\N{LATIN SMALL LETTER SHARP S} 290

\N{name} 290

(see also pragma)

inhibiting 292

\p{No} 123

No Dashes Hall Of Shame 458

no re 'debug361

no_match_vars 357

nomenclature 27

non-capturing parentheses 45, 137-138

(see also parentheses)

Nondeterministic Finite Automaton (see NFA)

None (.NET) 421, 427

non-greedy (see lazy)

nonillion 226

nonparticipation parentheses 450, 453-454, 469

nonregular sets 180

\p{Non_Spacing_Mark} 123

non-word boundaries (see word boundaries)

“normal” 263-266

NUL 117

with dot 119

NULL 454

\p{Number} 122

/o 352-353

with regex object 354

Obfuscated Perl Contest 320

object model

Java 371-372

.NET 416-417

Object Oriented Perl 339

object-oriented handling 95-97

compile caching 244

octal escape 116, 118

vs. backreference 412-413

Perl 286

offset preg_match 453

on-demand recompilation 351

oneself example 332, 334

\p{Open_Punctuation} 123

operators Perl list 285

optimization 240-252

(see also: atomic grouping; possessive quantifiers; efficiency)

automatic possessification 251

BLTN 236

with bump-along 255

end-of-string anchor 246

excessive backtrack 249-250

hand tweaking 252-261

implicit line anchor 191

initial character discrimination 245-248, 252, 257-259, 332, 361

JIT 236, 410

lazy evaluation 181

lazy quantifier 248, 257

leading .* 246

literal-string concatenation 247

need cognizance 252

needless class elimination 248

needless parentheses 248

pre-check of required character 245-248, 252, 257-259, 332, 361

simple repetition

discussed 247-248

small quantifier equivalence 251-252

state suppression 250-251

string/line anchors 149, 181

super-linear short-circuiting 250

option

-0 36

-c 361

-Dr 363

-e 36, 53, 361

-i 53

-M 361

-Mre=debug 363

-n 36

-p 53

-w 38, 296, 326, 361

Option (.NET) 415

optional (see also quantifier)

whitespace 18

Options (Regex object method) 427

OR class set operations 125-126

Oram, Andy 5

ordered alternation 175-177

(see also alternation, ordered)

pitfalls 176

osmosis 293

/osmosis 293

\p{Other} 122

\p{Other_Letter} 123

\p{Other_Number} 123

\p{Other_Punctuation} 123

\p{Other_Symbol} 123

our 295, 336

overload pragma 342

\p{···}

Java 125

.NET 125

PHP 125

\p{P} 122

\p{^···} 288

\p{All} 125

Perl 288

\p{all} 369

panic: top_env 332

\p{Any} 125, 442

Perl 288

Papen, Jeffrey xxiv

PARAGRAPH SEPARATOR 109, 123, 370

\p{Paragraph_Separator} 123

parentheses

as \(···\) 86

and alternation 13

balanced 328-331, 340-341, 436, 475-478, 481

difficulty 193-194

capturing 137, 300

and DFAs 150, 182

introduced with egrep 20-22

mechanics 149

Perl 41

capturing only 152

counting 21

elimination optimization 248

grouping-only (see non-capturing parentheses)

limiting scope 18

named capture 138, 344-345, 408-409, 450-452, 457, 476-477

nested 328-331, 340-341, 436, 475-477, 481

non-capturing 45, 137-138

non-participating 300

nonparticipation 450, 453-454, 469

with split

.NET 409, 426

Perl 326

\p{Arrows} 124

parser 132, 389, 399

parsing regex 410

participate in match 140

Pascal 36, 59, 183

matching comments of 265

\p{Assigned} 125-126

Perl 288

patch 88

path (see backtracking)

pathname example 190-192

Pattern

CANON_EQ 108, 368

CASE_INSENSITIVE 95, 110, 368, 372

CASE_INSENSITIVE bug 392

COMMENTS 99, 219, 368, 401

compile 372

DOTALL 368, 370

flags 394

matcher 373

matches 395

MULTILINE 81, 368, 370

MULTILINE bug 387

pattern 394

quote 395

split 395-396

toString 394

UNICODE_CASE 368, 372

UNIX_LINES 368, 370

pattern argument 472

array order 462, 464

pattern arguments PHP 444, 448

pattern method 393-394

pattern modifier

A 447

D 442, 447

e 459, 465, 478

m 442

S 259, 447, 460, 467, 478-480

u 442, 447-448, 452-453

U 447

unknown errors 448

x 443, 471

X 447

pattern modifiers PHP 446-448

PatternSyntaxException 371, 373

\p{Basic_Latin} 124

\p{Box_Drawing} 124

\p{C} 122

Java 369

\p{Pc} 123, 406

\p{Cc} 123

\p{Cf} 123

\p{Cherokee} 122

\p{Close_Punctuation} 123

\p{Cn} 123, 125-126, 369, 408

Java 369

\p{Co} 123

\p{Connector_Punctuation} 123

\p{Control} 123

PCRE 91, 440

(see also PHP)

“extra stuff” 447

flavor overview 441

lookbehind 134

recursive matching 475-478

study 447

version covered 440

\w 120

web site 91

X pattern modifier 447

pcre_study 259

\p{Currency} 124

\p{Currency_Symbol} 123

\p{Cyrillic} 122, 124

\p{Pd} 123

\p{Dash_Punctuation} 123

\p{Decimal_Digit_Number} 123

\p{Dingbats} 124

\p{Pe} 123

PeakWebhosting.com xxiv

\p{Enclosing_Mark} 123

people

Aho, Alfred 86, 180

Barwise, J. 85

Byington, Ryan xxiv

Click, Cliff xxiv

Constable, Robert 85

Conway, Damian 339

Cruise, Tom 51

Filo, David 397

Fite, Liz 33

Friedl, Alfred 176

Friedl, brothers 33

Friedl, Fumie v, xxiv

birthday 11-12

Friedl, Jeffrey xxiii

Friedl, Stephen xxiv, 458

George, Kit xxiv

Gill, Stuart xxiv

Gosling, James 89

Greant, Zak xxiv

Gutierrez, David xxiv

Hazel, Philip xxiv, 91, 440

Keisler, H.J. 85

Kleene, Stephen 85

Kunen, K. 85

Lord, Tom 183

Lunde, Ken xxiv, 29

Maton, William xxiv, 36

McCloskey, Mike xxiv

McCulloch, Warren 85

Morse, Ian xxiv

Oram, Andy 5

Papen, Jeffrey xxiv

Perl Porters 90

Pinyan, Jeff 246

Pitts, Walter 85

Reinhold, Mark xxiv

Sethi, Ravi 180

Spencer, Henry 88, 182-183, 243

Thompson, Ken 85-86, 111

Tubby 265

Ullman, Jeffrey 180

Wall, Larry 88-90, 140, 363

Zawodny, Jeremy 258

Zmievski, Andrei xxiv, 440

Perl

\p{···} 125

$/ 35

context (see also match, context)

contorting 294

efficiency 347-363

flavor overview 92, 287

greatest weakness 286

history 88-90, 308

introduction 37-38

line anchors 130

modifiers 292-293

motto 348

option

-0 36

-c 361

-Dr 363

-e 36, 53, 361

-i 53

-M 361

-Mre=debug 363

-n 36

-p 53

-w 38, 296, 326, 361

regex operators 285

search and replace 318-321

Σ 110

Unicode 288

version covered 283

warnings 38

($^W variable) 297

use warnings 326, 363

Perl Porters 90

perladmin 299

\p{Pf} 123

Java 369

\p{Final_Punctuation} 123

\p{Format} 123

\p{Gujarati} 122

\p{Han} 122

\p{Hangul_Jamo} 124

\p{Hebrew} 122, 124

\p{Hiragana} 122

PHP 439-484

after-match data 138

benchmarking 234-235

callback 463, 465

CSV parsing example 480

efficiency 478-480

flavor overview 441

history 440

line anchors 130

lookbehind 134, 443

“missing” functions 471

\p{···} 125

pattern arguments 444, 448

recursive matching 475-478

regex delimiters 445, 448

search and replace 458-465

single-quoted string 444

strings 103-104

str_replace 458

study 447

Unicode 442, 447

version covered 440

\w 120

word boundaries 134

\p{Pi} 123

Java 369

\p{InArrows} 124

\p{InBasic_Latin} 124

\p{InBox_Drawing} 124

\p{InCurrency} 124

\p{InCyrillic} 124

\p{InDingbats} 124

\p{InHangul_Jamo} 124

\p{InHebrew} 124

\p{Inherited} 122

\p{Initial_Punctuation} 123

\p{InKatakana} 124

\p{InTamil} 124

\p{InTibetan} 124

Pinyan, Jeff 246

\p{IsCherokee} 122

\p{IsCommon} 122

\p{IsCyrillic} 122

\p{IsGujarati} 122

\p{IsHan} 122

\p{IsHebrew} 122

\p{IsHiragana} 122

\p{IsKatakana} 122

\p{IsLatin} 122

\p{IsThai} 122

\p{IsTibetan} 124

Pitts, Walter 85

\p{javaJavaIdentifierStart} 369

\p{Katakana} 122, 124

\p{L} 121-122, 133, 368, 395

\p{L&} 122-123, 125, 442

Java 369

Perl 288

\pL PHP 442

\p{Latin} 122

(?P<···>) 451-452, 457

(?P<name>···) (see named capture)

\p{Letter} 122, 288

\p{Letter_Number} 123

\p{Line_Separator} 123

\p{Ll} 123, 406

\p{Lm} 123, 406

\p{Lo} 123, 406

\p{Lowercase_Letter} 123

\p{Lt} 123, 406

\p{Lu} 123, 406

plus

as \+ 141

backtracking 162

greedy 141, 447

introduced 18-20

lazy 141

possessive 142

\p{M} 120, 122

\p{Mark} 122

\p{Math_Symbol} 123

\p{Mc} 123

\p{Me} 123

\p{Mn} 123

\p{Modifier_Letter} 123

\p{Modifier_Symbol} 123

\pN PHP 442

\p{N} 122, 395

(?P=name) (see named capture)

\p{Nd} 123, 368, 406

\p{Nl} 123

\p{No} 123

\p{Non_Spacing_Mark} 123

\p{Number} 122

\p{Po} 123

\p{Open_Punctuation} 123

population example 59

pos 130-133, 313-314, 316

(see also \G)

positive lookahead (see lookahead, positive)

positive lookbehind (see lookbehind, positive)

POSIX

[.···.] 128

[:···:] 127

Basic Regular Expressions 87-88

bracket expressions 127

character class 127

character class and locale 127

character equivalent 128

collating sequences 128

dot 119

empty alternatives 140

Extended Regular Expressions 87-88

superficial flavor chart 88

locale 127

overview 87

longest-leftmost rule 177-179, 335

POSIX NFA

backtracking example 229

testing for 146-147

possessive quantifier 477, 483

possessive quantifiers 142, 172-173, 477, 483

(see also atomic grouping)

automatic 251

for efficiency 259-260, 268-270, 482

mimicking 343-344

optimization 250-251

possessive quantifiers example 198, 201

postal code example 209-212

\p{Other} 122

\p{Other_Letter} 123

\p{Other_Number} 123

\p{Other_Punctuation} 123

\p{Other_Symbol} 123

£ 124

\p{P} 122

\p{Paragraph_Separator} 123

\p{Pc} 123, 406

\p{Pd} 123

\p{Pe} 123

\p{Pf} 123

Java 369

\p{Pi} 123

Java 369

\p{Po} 123

\p{Private_Use} 123

\p{Ps} 123

\p{Punctuation} 122

pragma

charnames 290

(see also \N{name})

overload 342

re 361, 363

strict 295, 336, 345

warnings 326, 363

pre-check of required character 245-248, 252, 257-259, 361

mimic 258-259

viewing 332

preg function interface 443-448

preg suite 439

“missing” functions 471

preg_grep 469-470

PREG_GREP_INVERT 470

preg_match 449-453

offset 453

preg_match_all 453-457

PREG_OFFSET_CAPTURE 452, 454, 456

preg_pattern_error 474

PREG_PATTERN_ORDER 455

preg_quote 136, 470-471

preg_regex_error 475

preg_regex_to_pattern 472-474

preg_replace 458-464

preg_replace_callback 463-465

PREG_SET_ORDER 456

preg_split 465-469

PREG_SPLIT_DELIM_CAPTURE 468-469

split limit 469

PREG_SPLIT_NO_EMPTY 468

PREG_SPLIT_OFFSET_CAPTURE 468

pre-match copy 355

prepending filename to line 79

price rounding example 51-52, 167-168

with alternation 175

with atomic grouping 170

with possessive quantifier 169

Principles of Compiler Design 180

printf 40

private vs. global Perl variables 295

\p{Private_Use} 123

procedural handling 95-97

compile caching 244

processing instructions 483

procmail 94

version covered 91

Programming Perl 283, 286, 339

promote 294-295

properties 121-123, 125-126, 288, 368-369, 442

PS 109, 123, 370

\p{S} 122

\p{Ps} 123

\p{Sc} 123-124

\p{Separator} 122

\p{Sk} 123

\p{Sm} 123

\p{So} 123

\p{Space_Separator} 123

\p{Spacing_Combining_Mark} 123

\p{Symbol} 122

\p{Tamil} 124

\p{Thai} 122

\p{Tibetan} 124

\p{Titlecase_Letter} 123

publication

Bulletin of Math. Biophysics 85

CJKV Information Processing 29

Communications of the ACM 85

Compilers — Principles, Techniques, and Tools 180

Embodiments of Mind 85

The Kleene Symposium 85

“A logical calculus of the ideas imminent in nervous activity” 85

Object Oriented Perl 339

Principles of Compiler Design 180

Programming Perl 283, 286, 339

Regular Expression Search Algorithm 85

“The Role of Finite Automata in the Development of Modern Computing Theory” 85

\p{Unassigned} 123, 125

Perl 288

\p{Punctuation} 122

\p{Uppercase_Letter} 123

Python

after-match data 138

benchmarking 238-239

line anchors 130

mode modifiers 135

regex approach 97

strings 104

version covered 91

word boundaries 134

\Z 112

\p{Z} 121-122, 368, 407

\pZ PHP 442

\p{Zl} 123

\p{Zp} 123

\p{Zs} 123

\Q Java 368, 395, 403

Qantas 11

\Q···\E 290

inhibiting 292

qed 85

qr/···/ (see also regex objects)

introduced 76

quantifier (see also: plus; star; question mark; interval; lazy; greedy; possessive quantifiers)

and backtracking 162

factor out 255

grouping for 18

multiple levels 266

optimization 247-248

and parentheses 18

possessive 477, 483

possessive quantifiers 142, 172-173, 477, 483

for efficiency 259-260, 268-270, 482

automatic

optimization

mimicking

question mark

as \? 141

backtracking 160

greedy 141, 447

introduced 17-18

lazy 141

possessive 142

smallest preceding subexpression 29

question mark

as \? 141

backtracking 160

greedy 141, 447

introduced 17-18

lazy 141

possessive 142

quote method 136, 395

quoted string (see double-quoted string example)

quoteReplacement method 379

quotes multi-character 165-166

r"···" 104

\r 49, 115-116

machine-dependency 115

(?R) 475

PCRE 475

PHP 475

$^R 302, 327

re 361, 363

re pragma 361, 363

reality check 226-228

recursive matching (see also dynamic regex)

Java 402

.NET 436

PCRE 475-478

PHP 475-478, 481-484

red dragon 180

Reflection 435

regex

balancing needs 186

cache 242-245, 350-352, 432, 478

compile 179-180, 350

default 308

delimiters 291-292

DFA (see DFA)

encapsulation (see regex objects)

engine analogy 143-147

vs. English 275

error checking 474

frame of mind 6

freeflowing design 277-281

history 85-91

library 76, 208

longest-leftmost match 177-179

shortest-leftmost 182

mechanics 241-242

NFA (see NFA)

nomenclature 27

operands 288-292

overloading 291, 328

inhibiting 292

problems 344

subexpression

defined 29

subroutines 476

regex approach .NET 96-97

regex delimiters PHP 445, 448

regex flavor

Java 366-370

.NET 407

regex literal 288-292, 307

inhibiting processing 292

locking in 352

parsing of 292

processing 350

regex objects 354

Regex (.NET)

CompileToAssembly 433, 435

creating

options 419-421

Escape 432

GetGroupNames 427-428

GetGroupNumbers 427-428

GroupNameFromNumber 427-428

GroupNumberFromName 427-428

IsMatch 413, 421, 431

Match 96, 414, 416, 421, 431

Matches 422, 431

object

creating 96, 416, 419-421

exceptions 419

using 96, 421

Options 427

Replace 414-415, 423-424, 431

RightToLeft 427

Split 425-426, 431

ToString 427

Unescape 433

regex objects 303-306

(see also qr/···/)

efficiency 353-354

/g 354

match modes 304-305

/o 354

in regex literal 354

viewing 305-306

regex operators Perl 285

regex overloading 292

(see also use overload)

regex overloading example 341-345

http://regex.info/ xxiv, 7, 345, 358, 451

RegexCompilationInfo 435

regex-directed matching 153

(see also NFA)

and backreferences 303

and greediness 162

Regex.Escape 136

RegexOptions

Compiled 237, 408, 410, 420, 427-428, 435

ECMAScript 406, 408, 412-413, 421, 427

ExplicitCapture 408, 420, 427

IgnoreCase 96, 99, 408, 419, 427

IgnorePatternWhitespace 99, 408, 419, 427

Multiline 408, 419-420, 427

None 421, 427

RightToLeft 408, 411-412, 420, 426-427, 429-430

Singleline 408, 420, 427

region

additional example 398

anchoring bounds 388

hitEnd 390

Java 384-389

methods that reset 385

requireEnd 390

resetting 392-393

setting one edge 386

transparent bounds 387

region method 386

regionEnd method 386

regionStart method 386

reg_match 454

regsub 100

regular expression origin of term 85

Regular Expression Search Algorithm 85

regular sets 85

Reinhold, Mark xxiv

removing whitespace 199-200

Replace (Regex object method) 423-424

replaceAll method 378

replaceFirst method 379

replacement argument 460

array order 462, 464

Java 380

PHP 459

reproductive organs 5

required character pre-check 245-248, 252, 257-259, 332, 361

requireEnd method 389-392

re-search-forward 100-101

reset method 385, 392-393

Result (Match object method) 429

RightToLeft (Regex property) 427-428

RightToLeft (.NET) 408, 411-412, 420, 426-427, 429-430

“The Role of Finite Automata in the Development of Modern Computing Theory” 85

Ruby

$ and ^ 112

after-match data 138

benchmarking 238

line anchors 130

mode modifiers 135

version covered 91

word boundaries 134

rule

earliest match wins 148-149

standard quantifiers are greedy 151-153

rx 183

\p{S} 122

s/···/···/ 50, 318-321

\s 49, 121

Emacs 128

introduction 47

Perl 288

PHP 442

(?s) (see: dot-matches-all mode; mode modifier)

\S 49, 56, 121

/s 135

(see also: dot-matches-all mode; mode modifier)

saved states (see backtracking, saved states)

SawAmpersand 358

say what you mean 195, 274

SBOL 362

\p{Sc} 123-124

scalar context 294, 310, 312-316

forcing 310

scanner 132, 389, 399

schaffkopf 33

scope lexical vs. dynamic 299

scripts 122, 288, 442

search and replace xvii

awk 100

Java 378-383

.NET 414, 423-424

Perl 318-321

PHP 458-465

Tcl 100

(see also substitution)

sed

after-match data 138

dot 111

history 87

version covered 91

word boundaries 134

Image 5

self-closing tag 481

\p{Separator} 122

server VM 236

set operations (see class, set operations)

Sethi, Ravi 180

shell 7

Σ 110

Java 110

Perl 110

simple quantifier optimization 247-248

single quotes delimiter 292, 319

Singleline (.NET) 408, 420, 427

single-quoted string PHP 444

\p{Sk} 123

\p{Sm} 123

small quantifier equivalence 251-252

\p{So} 123

\p{Space_Separator} 123

\p{Spacing_Combining_Mark} 123

span (see: mode-modified span; literal-text mode)

“special” 263-266

Spencer, Henry 88, 182-183, 243

split

with capturing parentheses

.NET 409, 426

Perl 326

PHP 468

chunk limit

Java 396

Perl 323

PHP 466

into characters 322

Java 395-396

limit 466-467

Java 396

Perl 323

PHP 466

Perl 321-326

PHP 465-469

trailing empty items 324, 468

whitespace 325

split method 395-396

Split (Regex object method) 425-426

ß 111, 128, 290

stacked data 456

standard formula for matching delimited text 196

star

backtracking 162

greedy 141, 447

introduced 18-20

lazy 141

possessive 142

start method 377

start of match (see \G)

start of word (see word boundaries)

start-of-line/string (see anchor, caret)

start-of-string anchor optimization 246, 255-256, 315

states (see also backtracking, saved states)

flushing (see: atomic grouping; look-around; possessive quantifiers)

stclass `list362

stock pricing example 51-52, 167-168

with alternation 175

with atomic grouping 170

with possessive quantifier 169

Strict (Option) 415

strict pragma 295, 336, 345

String

matches 376

replaceAll 378

replaceFirst 379

split 395

string (see also line)

double-quoted (see double-quoted string example)

initial string discrimination 245-248, 252, 257-259, 332, 361

vs. line 55

match position (see pos)

pos (see pos)

StringBuffer 373, 380, 382, 397

StringBuilder 373, 382, 397

strings

C# 103

Emacs 101

Java 102

PHP 103-104

Python 104

as regex 101-105, 305

Tcl 104

VB.NET 103

stripping whitespace 199-200

str_replace 458

PHP 458

study PHP 447

study 359-360

when not to use 359

subexpression defined 29

subroutines regex 476

substitution xvii

delimiter 319

s/···/···/ 50, 318-321

(see also search and replace)

substring initial substring discrimination 245-248, 252, 257-259, 332, 361

subtraction

character class 406

class (set) 126

class (simple) 125

Success

Group object method 430

Match object method 427

Sun’s regex package (see java.util.regex)

super-linear (see neverending match)

super-linear short-circuiting 250

\p{Symbol} 122

Synchronized Match object method 430

syntax class Emacs 128

System.currentTimeMillis() 236

System.Reflection 435

System.Text.RegularExpressions 413, 415

\t 49, 115-116

introduced 44

tag

matching 200-201

XML 481

tag-team matching 132, 315

\p{Tamil} 124

Tcl

[:<:] 91

benchmarking 239

dot 111, 113

flavor overview 92

hand-tweaking 243, 259

line anchors 113, 130

mode modifiers 135

regex implementation 183

regsub 100

search and replace 100

strings 104

version covered 91

word boundaries 134

temperature conversion example

Java 382

.NET 425

Perl 37, 283

PHP 444

terminators (see line terminators)

testing engine type 146-147

text method 394

text-directed matching 153

(see also DFA)

regex appearance 162

text-to-HTML example 67-77

\p{Thai} 122

then (see conditional)

theory of an NFA 180

There’s more than one way to do it 349

this|that example 133, 139, 243, 245-247, 252, 255, 260-261

Thompson, Ken 85-86, 111

thread scheduling Java benchmarking 236

\p{Tibetan} 124

tied variables 299

time() 232

time of day 26

Time::HiRes 232, 358, 360

Time.new 238

Timer() 237

timezone PHP 235

title case 110

\p{Titlecase_Letter} 123

TiVo 3

tokenizer 132, 389, 399

building 315

toMatchResult method 377

toothpicks scattered 101

tortilla 128

ToString

Group object method 430

Match object method 427

Regex object method 427

toString method 393-394

Traditional NFA testing for 146-147

trailing context 182

transmission (see also \G)

optimizations 246-247

transparent bounds 387

Java 387

Tubby 265

typographical conventions xxi

\u 117, 290, 406

\U 117

\U···\E 290

inhibiting 292

uc 290

U+C0B5 107

ucfirst 290

UCS-2 encoding 107

UCS-4 encoding 107

Ullman, Jeffrey 180

\p{Unassigned} 123, 125

Perl 288

unconditional caching 350

underscore in \w history 89

Unescape 433

Unicode

block 124

Java 369, 402

.NET 407

Perl 288

categories (see Unicode, properties)

character

combining 107, 120, 122

code point

beyond U+FFFF 109

introduced 107

multiple 108

unassigned in block 124

combining character 107, 120, 122

Java 368-369, 402-403

line terminators 109-111, 370

Java 370

loose matching (see case-insensitive mode)

.NET 407

official web site 127

overview 106-110

Perl 288

PHP 442, 447

properties 121, 369

(see also \p{···})

Java 368

list 122-123

\p{All} 125, 288

\p{Any} 125, 288, 442

\p{Assigned} 125-126, 288

Perl 288

PHP 442

\p{Unassigned} 123, 125, 288

script 122

Perl 288

PHP 442

Version 3.1 109

\w 120

whitespace and /x 288

UnicodeData.txt 290

unicore 290

unmatch 152, 161, 163

.* 165

atomic grouping 171

unrolling the loop 261-276

example 270-271, 477

general pattern 264

\p{Uppercase_Letter} 123

URL encoding 320

URL example 74-77, 201-204, 208, 260, 303-304, 306, 320, 450-451

egrep 25

Java 209

.NET 204

plucking 206-208

use charnames 290

use Config 290, 299

use English 357

use overload 342

(see also regex overloading)

use re 'debug361, 363

use re 'eval337

use strict 295, 336, 345

use Time::HiRes 358, 360

use warnings 326, 363

useAnchoringBounds method 388

usePattern method 393, 399

username example 73, 76, 98

plucking from text 71-73

in URL 74-77

useTransparentBounds method 387

using System.Text.RegularExpressions 416

UTF-16 encoding 107

UTF-8 encoding 107, 442, 447

\v 115-116, 364

\V 364

Value

Group object method 430

Match object method 427

variable names example 24

variables

after match

pre-match copy 355

binding 339

fully qualified 295

interpolation 344

naughty 356

tied 299

VB.NET xvii

code example 204, 219

comments 99

regex approach 96-97

strings 103

(see also .NET)

verbatim strings 103

Version 7 regex 183

Version 8 regex 183

version covered

Java 365

.NET 405

Perl 283

PHP 440

others 91

version history Java 365, 368-369, 392, 401

vertical tab 109, 370

Perl \s 288

vi after-match data 138

Vietnamese text processing 29

virtual machine 236

Visual Basic xvii

(see also VB.NET)

(see also .NET)

Visual Studio .NET 434

VM 236

Java 236

warming up 236

void context 294

VT 109, 370

$^W 297

\w 49, 65, 120

Emacs 129

Java 368

many different interpretations 93

Perl 288

PHP 120, 442

\W 49, 121

Wall, Larry 88-90, 140, 363

warming up Java VM 236

warnings 296

($^W variable)

Perl 297

Perl 38

temporarily turning off 297

use warnings

Perl 326, 363

warnings pragma 326, 363

while vs. foreach vs. if 320

whitespace

allowing optional 18

removing 199-200

width attribute Java example 397

wildcards filename 4

word anchor mechanics of matching 150

word boundaries 133

\<···\>

egrep 15

introduced 15

Java 134

many programs 134

mimicking 66, 134, 341-342

.NET 134

Perl 288

PHP 134

www.cpan.org 358

www.PeakWebhosting.com xxiv

www.regex.info 358

www.unixwiz.net xxiv, 458

\X 108, 120

/x 135, 288

(see also: comments and free-spacing mode; mode modifier)

history 90

introduced 72

(?x) (see: comments and free-spacing mode; mode modifier)

\x 117, 406

Perl 286

XML 483

CDATA 483

XML example 481-484

-y old grep 86

¥ 124

Yahoo! xxiv, 74, 132, 190, 206-207, 258, 314, 397

\Z 112, 129-130

(see also enhanced line-anchor mode)

Java 370

optimization 246

\p{Z} 121-122, 368, 407

\z 112, 129-130, 316, 447

(see also enhanced line-anchor mode)

optimization 246

PHP 442

Zawodny, Jeremy 258

zero-width assertions (see: anchor; lookahead; lookbehind)

ZIP code example 209-212

\p{Zl} 123

Zmievski, Andrei xxiv, 440

\p{Zp} 123

\p{Zs} 123