Japanese
text processing 29
“japhy” 246
(see also java.util.regex)
after-match data 138
anchoring bounds 388
benchmarking 235-236
BLTN 236
bugs 365, 368-369, 387, 392, 399, 403
code example 81, 209, 217, 235, 371, 375, 378-379, 381-384, 389
CSV parsing example 401
description 365
doubled-word example 81
JIT 236
line terminators 370
match modes 368
match pointer 374, 383, 398, 400
matching comments 272-276
method chaining 389
method index 366
Mustang 401
object model 371-372
\p{···} 125
regex flavor 366-370
region 384-389
search and replace 378-383
Σ 110
split 395-396
strings 102
transparent bounds 387
Unicode 369
URL example 209
version covered 365
version history 365, 368-369, 392, 401
VM 236
word boundaries 134
java properties 369
\p{javaJavaIdentifierStart} 369
java.lang.Character 369
java.util.regex (see Java)
java.util.Scanner 390
Jeffs example 61-64
JfriedlsRegexLibrary 434-435
JIT
Java 236
.NET 410
JRE 236
keeping in sync 210-211
Keisler, H.J. 85
Kleene, Stephen 85
TheKleene Symposium 85
\kname (see named capture)
Korean text processing 29
Kunen, K. 85
£ 124
\l 290
Java 369
Perl 288
language (see also: .NET; C#; Java; MySQL; Perl; procmail; Python; Ruby; Tcl; VB.NET)
identifiers 24
\p{Latin} 122
Latin-1 encoding 29, 87, 106, 108, 123
lazy 166-167
(see also greedy)
favors match 167-168
quantifier 141
\L···\E 290
inhibiting 292
lc 290
lcfirst 290
leftmost match 177-179
Length
Group object method 430
Match object method 429
length-cognizance optimization 245, 247
\p{Letter_Number} 123
lex 86
$ 112
dot 111
history 87
and trailing context 182
building 315
lexical scope 299
LIFO backtracking 159
limit
backtracking 239
preg_split 466-467
recursion 249-250
line (see also string)
anchor optimization 246
vs. string 55
line anchor 112-113
mechanics of matching 150
variety of implementations 87
line anchors
.NET 130
Perl 130
PHP 130
line terminators 109-111, 129-130, 370
with $ and ^ 112
Java 370
\p{Line_Separator} 123
link
matching 201
(see also URL examples)
Java 209
VB.NET 204
forcing 310
literal string initial string discrimination 245-248, 252, 257-259, 332, 361
literal text
exposing 255
introduced 5
mechanics of matching 149
pre-check optimization 245-248, 252, 257-259, 332, 361
literal-text mode 113, 136, 290
inhibiting 292
.NET 136
in embedded code 336
vs. my 297
locale 127-128
overview 87
\w 120-121
localizing 296-297
lockup (see neverending match)
locking in regex literal 352
“A logical calculus of the ideas imminent in nervous activity” 85
longest match finding 334-335
longest-leftmost match 148, 177-179
lookahead 133
(see also lookaround)
auto 410
introduced 60
mimic atomic grouping 174
mimic optimizations 258-259
negated
<B>···</B> 167
positive vs. negative 66
lookahead example 61-64
lookaround
backtracking 173-174
conditional 140-141
and DFAs 182
doesn’t consume text 60
introduced 59
mimicking class set operations 126
mimicking word boundaries 134
Perl 288
lookbehind 133
(see also lookaround)
Java 368
.NET 408
Perl 288
positive vs. negative 66
unlimited 408
lookingAt method 376
loose matching (see case-insensitive mode)
Lord, Tom 183
\p{Lowercase_Letter} 123
m/···/ introduced 38
(?m) (see: enhanced line-anchor mode; mode modifier)
/m 135
(see also: enhanced line-anchor mode; mode modifier)
machine-dependent character codes 115
MacOS 115
mail processing example 53-59
makudonarudo example 165, 169, 228-232, 264
\p{Mark} 122
match 306-318
(see also: DFA; NFA)
actions 95
DFA vs. NFA 224
efficiency 179
example with backtracking 160
example without backtracking 160
lazy example 161
leftmost-longest 335
longest 334-335
m/···/
introduced 38
mechanics (see also: greedy; lazy)
.* 152
anchors 150
capturing parentheses 149
character classes and dot 149
consequences 156
greedy introduced 151
literal text 149
modes 110-113
Java 368
negating 309
avoiding 264-266
discovery 226-228
explanation 226-228
non-determinism 264
short-circuiting 250
solving with atomic grouping 268
solving with possessive quantifiers 268
of nothing 454
position (see pos)
POSIX
Perl 335
shortest-leftmost 182
side effects 317
intertwined 43
Perl 40
speed 181
in a string 27
tag-team 132
viewing mechanics 331-332
Match Empty 433
match modes Java 368
Match (.NET) Success 96
Match object (.NET) 417
Capture 437
Groups 429
Index 429
Length 429
NextMatch 429
Result 429
Success 427
Synchronized 430
ToString 427
using 427
Value 427
match pointer Java 374, 383, 398, 400
Match (Regex object method) 421
“match rejected by optimizer” 363
match results Java 376
MatchCollection 422
Matcher
appendReplacement 380
appendTail 381
end 377
find 375
group 377
groupCount 377
hasAnchoringBounds 388
hasTransparentBounds 387
hitEnd 389-392
lookingAt 376
matches 376
pattern 393
quoteReplacement 379
region 384-389
region 386
regionEnd 386
regionStart 386
replaceAll 378
replaceFirst 379
replacement argument 380
requireEnd 389-392
reset 392-393
start 377
text 394
toMatchResult 377
toString 393
useAnchoringBounds 388
useTransparentBounds 387
Matcher object 373
reusing 392-393
$matches 450
vs. $all_matches 454
matches
unexpected 194-195
viewing all 332
Matches (Regex object method) 422
MatchEvaluator 423-424
matching
delimited text 196-198
HTML tag 200
longest-leftmost 177-179
matching comments Java 272-276
MatchObject object (.NET) creating 422
\p{Math_Symbol} 123
mb_ereg suite 439
MBOL 362
\p{Mc} 123
McCloskey, Mike xxiv
McCulloch, Warren 85
\p{Me} 123
mechanics viewing 331-332
metacharacter
conflicting 44-46
differing contexts 10
introduced 5
vs. metasequence 27
metasequence defined 27
method chaining 389
Java 389
method index Java 366
mimic
$’ 357
$` 357
atomic grouping 174
class set operations 126
conditional with lookaround 140
initial-character discrimination optimization 258-259
named capture 344-345
POSIX matching 335
possessive quantifiers 343-344
variable interpolation 321
word boundaries 66, 134, 341-342
minlen length 362
minus in character class 9
MISL .NET 410
“missing” functions PHP 471
\p{Mn} 123
mode-modified span 110, 135-136, 367, 392, 407, 446
modes introduced with egrep 14-15
\p{Modifier_Letter} 123
modifiers 372
(see also match, modes)
combining 69
example with five 316
/g 51
/i 47
“locking in” 304-305
notation 99
/osmosis 293
Perl 292-293
Perl core 292-293
with regex object 304-305
unknown 448
\p{Modifier_Symbol} 123
Morse, Ian xxiv
motto Perl 348
-Mre=debug (see use re 'debug’)
multi-character quotes 165-166
Multiline (.NET) 408, 419-420, 427
multiple-byte character encoding 29
MungeRegexLiteral 342-344, 346
Mustang Java 401
my
binding 339
in embedded code 338-339
vs. local 297
MySQL
after-match data 138
DBIx::DWIW 258
version covered 91
word boundaries 134
introduced 44
machine-dependency 115
(?n) 408
named capture 138
mimicking 344-345
.NET 408-409
numeric names 451
with unnamed capture 409
naughty variables 356
OK for debugging 331
negated class
introduced 10-11
and lazy quantifiers 167
Tcl 112
negative lookahead (see lookahead, negative)
negative lookbehind (see lookbehind, negative)
nervous system 85
nested constructs
.NET 436
$+ 202
after-match data 138
benchmarking 237
character-class subtraction 406
code example 219
flavor overview 92
JIT 410
line anchors 130
literal-text mode 136
MISL 410
object model 417
\p{···} 125
regex approach 96-97
regex flavor 407
search and replace 414, 423-424
URL example 204
version covered 405
word boundaries 134
(see also VB.NET)
neurophysiologists early regex study 85
neverending match 222-228, 330, 340
avoiding 264-266
discovery 226-228
explanation 226-228
non-determinism 264
short-circuiting 250
solving with atomic grouping 268
solving with possessive quantifiers 268
newline and HTTP 115
NextMatch (Match object method) 429
NFA
acronym spelled out 156
and alternation 174-175
compared with DFA 156-157, 180-183
control benefits 155
efficiency 179
essence (see backtracking)
first introduced 145
freeflowing regex 277-281
and greediness 162
implementation ease 183
introduction 153
nondeterminism 265
checkpoint 264-265
POSIX efficiency 179
testing for 146-147
theory 180
\p{Nl} 123
\N{LATIN SMALL LETTER SHARP S} 290
\N{name} 290
(see also pragma)
inhibiting 292
\p{No} 123
No Dashes Hall Of Shame 458
no re 'debug’ 361
no_match_vars 357
nomenclature 27
non-capturing parentheses 45, 137-138
(see also parentheses)
Nondeterministic Finite Automaton (see NFA)
non-greedy (see lazy)
nonillion 226
nonparticipation parentheses 450, 453-454, 469
nonregular sets 180
\p{Non_Spacing_Mark} 123
non-word boundaries (see word boundaries)
“normal” 263-266
NUL 117
with dot 119
NULL 454
\p{Number} 122
/o 352-353
with regex object 354
Obfuscated Perl Contest 320
object model
Java 371-372
.NET 416-417
Object Oriented Perl 339
object-oriented handling 95-97
compile caching 244
vs. backreference 412-413
Perl 286
offset preg_match 453
on-demand recompilation 351
\p{Open_Punctuation} 123
operators Perl list 285
optimization 240-252
(see also: atomic grouping; possessive quantifiers; efficiency)
automatic possessification 251
BLTN 236
with bump-along 255
end-of-string anchor 246
excessive backtrack 249-250
hand tweaking 252-261
implicit line anchor 191
initial character discrimination 245-248, 252, 257-259, 332, 361
lazy evaluation 181
leading ⌈.*⌋ 246
literal-string concatenation 247
need cognizance 252
needless class elimination 248
needless parentheses 248
pre-check of required character 245-248, 252, 257-259, 332, 361
simple repetition
discussed 247-248
small quantifier equivalence 251-252
state suppression 250-251
super-linear short-circuiting 250
option
-0 36
-c 361
-Dr 363
-i 53
-M 361
-Mre=debug 363
-n 36
-p 53
Option (.NET) 415
optional (see also quantifier)
whitespace 18
Options (Regex object method) 427
OR class set operations 125-126
Oram, Andy 5
ordered alternation 175-177
(see also alternation, ordered)
pitfalls 176
osmosis 293
/osmosis 293
\p{Other} 122
\p{Other_Letter} 123
\p{Other_Number} 123
\p{Other_Punctuation} 123
\p{Other_Symbol} 123
overload pragma 342
\p{···}
Java 125
.NET 125
PHP 125
\p{P} 122
\p{^···} 288
\p{All} 125
Perl 288
\p{all} 369
panic: top_env 332
Perl 288
Papen, Jeffrey xxiv
PARAGRAPH SEPARATOR 109, 123, 370
\p{Paragraph_Separator} 123
parentheses
as \(···\) 86
and alternation 13
balanced 328-331, 340-341, 436, 475-478, 481
difficulty 193-194
introduced with egrep 20-22
mechanics 149
Perl 41
capturing only 152
counting 21
elimination optimization 248
grouping-only (see non-capturing parentheses)
limiting scope 18
named capture 138, 344-345, 408-409, 450-452, 457, 476-477
nested 328-331, 340-341, 436, 475-477, 481
non-participating 300
nonparticipation 450, 453-454, 469
with split
Perl 326
\p{Arrows} 124
parsing regex 410
participate in match 140
matching comments of 265
\p{Assigned} 125-126
Perl 288
patch 88
path (see backtracking)
pathname example 190-192
Pattern
CASE_INSENSITIVE 95, 110, 368, 372
CASE_INSENSITIVE bug 392
compile 372
flags 394
matcher 373
matches 395
MULTILINE bug 387
pattern 394
quote 395
split 395-396
toString 394
pattern argument 472
pattern arguments PHP 444, 448
pattern method 393-394
pattern modifier
A 447
m 442
U 447
unknown errors 448
X 447
pattern modifiers PHP 446-448
PatternSyntaxException 371, 373
\p{Basic_Latin} 124
\p{Box_Drawing} 124
\p{C} 122
Java 369
\p{Cc} 123
\p{Cf} 123
\p{Cherokee} 122
\p{Close_Punctuation} 123
Java 369
\p{Co} 123
\p{Connector_Punctuation} 123
\p{Control} 123
(see also PHP)
“extra stuff” 447
flavor overview 441
lookbehind 134
recursive matching 475-478
study 447
version covered 440
\w 120
web site 91
X pattern modifier 447
pcre_study 259
\p{Currency} 124
\p{Currency_Symbol} 123
\p{Pd} 123
\p{Dash_Punctuation} 123
\p{Decimal_Digit_Number} 123
\p{Dingbats} 124
\p{Pe} 123
PeakWebhosting.com xxiv
\p{Enclosing_Mark} 123
people
Barwise, J. 85
Byington, Ryan xxiv
Click, Cliff xxiv
Constable, Robert 85
Conway, Damian 339
Cruise, Tom 51
Filo, David 397
Fite, Liz 33
Friedl, Alfred 176
Friedl, brothers 33
birthday 11-12
Friedl, Jeffrey xxiii
George, Kit xxiv
Gill, Stuart xxiv
Gosling, James 89
Greant, Zak xxiv
Gutierrez, David xxiv
Keisler, H.J. 85
Kleene, Stephen 85
Kunen, K. 85
Lord, Tom 183
McCloskey, Mike xxiv
McCulloch, Warren 85
Morse, Ian xxiv
Oram, Andy 5
Papen, Jeffrey xxiv
Perl Porters 90
Pinyan, Jeff 246
Pitts, Walter 85
Reinhold, Mark xxiv
Sethi, Ravi 180
Spencer, Henry 88, 182-183, 243
Tubby 265
Ullman, Jeffrey 180
Zawodny, Jeremy 258
Perl
\p{···} 125
$/ 35
context (see also match, context)
contorting 294
efficiency 347-363
greatest weakness 286
introduction 37-38
line anchors 130
modifiers 292-293
motto 348
option
-0 36
-c 361
-Dr 363
-i 53
-M 361
-Mre=debug 363
-n 36
-p 53
regex operators 285
search and replace 318-321
Σ 110
Unicode 288
version covered 283
warnings 38
($^W variable) 297
Perl Porters 90
perladmin 299
\p{Pf} 123
Java 369
\p{Final_Punctuation} 123
\p{Format} 123
\p{Gujarati} 122
\p{Han} 122
\p{Hangul_Jamo} 124
\p{Hiragana} 122
PHP 439-484
after-match data 138
benchmarking 234-235
CSV parsing example 480
efficiency 478-480
flavor overview 441
history 440
line anchors 130
“missing” functions 471
\p{···} 125
recursive matching 475-478
search and replace 458-465
single-quoted string 444
strings 103-104
str_replace 458
study 447
version covered 440
\w 120
word boundaries 134
\p{Pi} 123
Java 369
\p{InArrows} 124
\p{InBasic_Latin} 124
\p{InBox_Drawing} 124
\p{InCurrency} 124
\p{InCyrillic} 124
\p{InDingbats} 124
\p{InHangul_Jamo} 124
\p{InHebrew} 124
\p{Inherited} 122
\p{Initial_Punctuation} 123
\p{InKatakana} 124
\p{InTamil} 124
\p{InTibetan} 124
Pinyan, Jeff 246
\p{IsCherokee} 122
\p{IsCommon} 122
\p{IsCyrillic} 122
\p{IsGujarati} 122
\p{IsHan} 122
\p{IsHebrew} 122
\p{IsHiragana} 122
\p{IsKatakana} 122
\p{IsLatin} 122
\p{IsThai} 122
\p{IsTibetan} 124
Pitts, Walter 85
\p{javaJavaIdentifierStart} 369
Java 369
Perl 288
\pL PHP 442
\p{Latin} 122
(?P<name>···) (see named capture)
\p{Letter_Number} 123
\p{Line_Separator} 123
\p{Lowercase_Letter} 123
plus
as \+ 141
backtracking 162
introduced 18-20
lazy 141
possessive 142
\p{Mark} 122
\p{Math_Symbol} 123
\p{Mc} 123
\p{Me} 123
\p{Mn} 123
\p{Modifier_Letter} 123
\p{Modifier_Symbol} 123
\pN PHP 442
(?P=name) (see named capture)
\p{Nl} 123
\p{No} 123
\p{Non_Spacing_Mark} 123
\p{Number} 122
\p{Po} 123
\p{Open_Punctuation} 123
population example 59
(see also \G)
positive lookahead (see lookahead, positive)
positive lookbehind (see lookbehind, positive)
POSIX
[.···.] 128
[:···:] 127
Basic Regular Expressions 87-88
bracket expressions 127
character class 127
character class and locale 127
character equivalent 128
collating sequences 128
dot 119
empty alternatives 140
Extended Regular Expressions 87-88
superficial flavor chart 88
locale 127
overview 87
longest-leftmost rule 177-179, 335
POSIX NFA
backtracking example 229
testing for 146-147
possessive quantifier 477, 483
possessive quantifiers 142, 172-173, 477, 483
(see also atomic grouping)
automatic 251
for efficiency 259-260, 268-270, 482
mimicking 343-344
optimization 250-251
possessive quantifiers example 198, 201
postal code example 209-212
\p{Other} 122
\p{Other_Letter} 123
\p{Other_Number} 123
\p{Other_Punctuation} 123
\p{Other_Symbol} 123
£ 124
\p{P} 122
\p{Paragraph_Separator} 123
\p{Pd} 123
\p{Pe} 123
\p{Pf} 123
Java 369
\p{Pi} 123
Java 369
\p{Po} 123
\p{Private_Use} 123
\p{Ps} 123
\p{Punctuation} 122
pragma
charnames 290
(see also \N{name})
overload 342
pre-check of required character 245-248, 252, 257-259, 361
mimic 258-259
viewing 332
preg function interface 443-448
preg suite 439
“missing” functions 471
preg_grep 469-470
PREG_GREP_INVERT 470
preg_match 449-453
offset 453
preg_match_all 453-457
PREG_OFFSET_CAPTURE 452, 454, 456
preg_pattern_error 474
PREG_PATTERN_ORDER 455
preg_regex_error 475
preg_regex_to_pattern 472-474
preg_replace 458-464
preg_replace_callback 463-465
PREG_SET_ORDER 456
preg_split 465-469
PREG_SPLIT_DELIM_CAPTURE 468-469
split limit 469
PREG_SPLIT_NO_EMPTY 468
PREG_SPLIT_OFFSET_CAPTURE 468
pre-match copy 355
prepending filename to line 79
price rounding example 51-52, 167-168
with alternation 175
with atomic grouping 170
with possessive quantifier 169
Principles of Compiler Design 180
printf 40
private vs. global Perl variables 295
\p{Private_Use} 123
procedural handling 95-97
compile caching 244
processing instructions 483
procmail 94
version covered 91
Programming Perl 283, 286, 339
promote 294-295
properties 121-123, 125-126, 288, 368-369, 442
\p{S} 122
\p{Ps} 123
\p{Sc} 123-124
\p{Separator} 122
\p{Sk} 123
\p{Sm} 123
\p{So} 123
\p{Space_Separator} 123
\p{Spacing_Combining_Mark} 123
\p{Symbol} 122
\p{Tamil} 124
\p{Thai} 122
\p{Tibetan} 124
\p{Titlecase_Letter} 123
publication
Bulletin of Math. Biophysics 85
CJKV Information Processing 29
Communications of the ACM 85
Compilers — Principles, Techniques, and Tools 180
Embodiments of Mind 85
The Kleene Symposium 85
“A logical calculus of the ideas imminent in nervous activity” 85
Object Oriented Perl 339
Principles of Compiler Design 180
Programming Perl 283, 286, 339
Regular Expression Search Algorithm 85
“The Role of Finite Automata in the Development of Modern Computing Theory” 85
Perl 288
\p{Punctuation} 122
\p{Uppercase_Letter} 123
Python
after-match data 138
benchmarking 238-239
line anchors 130
mode modifiers 135
regex approach 97
strings 104
version covered 91
word boundaries 134
\Z 112
\pZ PHP 442
\p{Zl} 123
\p{Zp} 123
\p{Zs} 123
Qantas 11
\Q···\E 290
inhibiting 292
qed 85
qr/···/ (see also regex objects)
introduced 76
quantifier (see also: plus; star; question mark; interval; lazy; greedy; possessive quantifiers)
and backtracking 162
factor out 255
grouping for 18
multiple levels 266
optimization 247-248
and parentheses 18
possessive quantifiers 142, 172-173, 477, 483
for efficiency 259-260, 268-270, 482
automatic
optimization
mimicking
question mark
as \? 141
backtracking 160
introduced 17-18
lazy 141
possessive 142
smallest preceding subexpression 29
question mark
as \? 141
backtracking 160
introduced 17-18
lazy 141
possessive 142
quoted string (see double-quoted string example)
quoteReplacement method 379
quotes multi-character 165-166
r"···" 104
machine-dependency 115
(?R) 475
PCRE 475
PHP 475
reality check 226-228
recursive matching (see also dynamic regex)
Java 402
.NET 436
PCRE 475-478
red dragon 180
Reflection 435
balancing needs 186
cache 242-245, 350-352, 432, 478
default 308
delimiters 291-292
DFA (see DFA)
encapsulation (see regex objects)
engine analogy 143-147
vs. English 275
error checking 474
frame of mind 6
freeflowing design 277-281
history 85-91
longest-leftmost match 177-179
shortest-leftmost 182
mechanics 241-242
NFA (see NFA)
nomenclature 27
operands 288-292
inhibiting 292
problems 344
subexpression
defined 29
subroutines 476
regex approach .NET 96-97
regex flavor
Java 366-370
.NET 407
inhibiting processing 292
locking in 352
parsing of 292
processing 350
regex objects 354
Regex (.NET)
creating
options 419-421
Escape 432
GetGroupNames 427-428
GetGroupNumbers 427-428
GroupNameFromNumber 427-428
GroupNumberFromName 427-428
object
exceptions 419
Options 427
RightToLeft 427
ToString 427
Unescape 433
regex objects 303-306
(see also qr/···/)
efficiency 353-354
/g 354
match modes 304-305
/o 354
in regex literal 354
viewing 305-306
regex operators Perl 285
regex overloading 292
(see also use overload)
regex overloading example 341-345
http://regex.info/ xxiv, 7, 345, 358, 451
RegexCompilationInfo 435
regex-directed matching 153
(see also NFA)
and backreferences 303
and greediness 162
Regex.Escape 136
RegexOptions
Compiled 237, 408, 410, 420, 427-428, 435
ECMAScript 406, 408, 412-413, 421, 427
IgnoreCase 96, 99, 408, 419, 427
IgnorePatternWhitespace 99, 408, 419, 427
RightToLeft 408, 411-412, 420, 426-427, 429-430
region
additional example 398
anchoring bounds 388
hitEnd 390
Java 384-389
methods that reset 385
requireEnd 390
resetting 392-393
setting one edge 386
transparent bounds 387
region method 386
regionEnd method 386
regionStart method 386
reg_match 454
regsub 100
regular expression origin of term 85
Regular Expression Search Algorithm 85
regular sets 85
Reinhold, Mark xxiv
removing whitespace 199-200
Replace (Regex object method) 423-424
replaceAll method 378
replaceFirst method 379
replacement argument 460
Java 380
PHP 459
reproductive organs 5
required character pre-check 245-248, 252, 257-259, 332, 361
requireEnd method 389-392
re-search-forward 100-101
Result (Match object method) 429
RightToLeft (Regex property) 427-428
RightToLeft (.NET) 408, 411-412, 420, 426-427, 429-430
“The Role of Finite Automata in the Development of Modern Computing Theory” 85
Ruby
$ and ^ 112
after-match data 138
benchmarking 238
line anchors 130
mode modifiers 135
version covered 91
word boundaries 134
rule
earliest match wins 148-149
standard quantifiers are greedy 151-153
rx 183
\p{S} 122
Emacs 128
introduction 47
Perl 288
PHP 442
(?s) (see: dot-matches-all mode; mode modifier)
/s 135
(see also: dot-matches-all mode; mode modifier)
saved states (see backtracking, saved states)
SawAmpersand 358
SBOL 362
\p{Sc} 123-124
scalar context 294, 310, 312-316
forcing 310
schaffkopf 33
scope lexical vs. dynamic 299
search and replace xvii
awk 100
Java 378-383
Perl 318-321
PHP 458-465
Tcl 100
(see also substitution)
sed
after-match data 138
dot 111
history 87
version covered 91
word boundaries 134
self-closing tag 481
\p{Separator} 122
server VM 236
set operations (see class, set operations)
Sethi, Ravi 180
shell 7
Σ 110
Java 110
Perl 110
simple quantifier optimization 247-248
single quotes delimiter 292, 319
Singleline (.NET) 408, 420, 427
single-quoted string PHP 444
\p{Sk} 123
\p{Sm} 123
small quantifier equivalence 251-252
\p{So} 123
\p{Space_Separator} 123
\p{Spacing_Combining_Mark} 123
span (see: mode-modified span; literal-text mode)
“special” 263-266
Spencer, Henry 88, 182-183, 243
with capturing parentheses
Perl 326
PHP 468
chunk limit
Java 396
Perl 323
PHP 466
into characters 322
Java 395-396
limit 466-467
Java 396
Perl 323
PHP 466
Perl 321-326
PHP 465-469
whitespace 325
split method 395-396
Split (Regex object method) 425-426
stacked data 456
standard formula for matching delimited text 196
star
backtracking 162
introduced 18-20
lazy 141
possessive 142
start method 377
start of match (see \G)
start of word (see word boundaries)
start-of-line/string (see anchor, caret)
start-of-string anchor optimization 246, 255-256, 315
states (see also backtracking, saved states)
flushing (see: atomic grouping; look-around; possessive quantifiers)
stclass `list’ 362
stock pricing example 51-52, 167-168
with alternation 175
with atomic grouping 170
with possessive quantifier 169
Strict (Option) 415
String
matches 376
replaceAll 378
replaceFirst 379
split 395
string (see also line)
double-quoted (see double-quoted string example)
initial string discrimination 245-248, 252, 257-259, 332, 361
vs. line 55
match position (see pos)
pos (see pos)
StringBuffer 373, 380, 382, 397
strings
C# 103
Emacs 101
Java 102
PHP 103-104
Python 104
Tcl 104
VB.NET 103
stripping whitespace 199-200
str_replace 458
PHP 458
study PHP 447
study 359-360
when not to use 359
subexpression defined 29
subroutines regex 476
substitution xvii
delimiter 319
(see also search and replace)
substring initial substring discrimination 245-248, 252, 257-259, 332, 361
subtraction
character class 406
class (set) 126
class (simple) 125
Success
Group object method 430
Match object method 427
Sun’s regex package (see java.util.regex)
super-linear (see neverending match)
super-linear short-circuiting 250
\p{Symbol} 122
Synchronized Match object method 430
syntax class Emacs 128
System.currentTimeMillis() 236
System.Reflection 435
System.Text.RegularExpressions 413, 415
introduced 44
tag
matching 200-201
XML 481
\p{Tamil} 124
Tcl
[:<:] 91
benchmarking 239
flavor overview 92
mode modifiers 135
regex implementation 183
regsub 100
search and replace 100
strings 104
version covered 91
word boundaries 134
temperature conversion example
Java 382
.NET 425
PHP 444
terminators (see line terminators)
testing engine type 146-147
text method 394
text-directed matching 153
(see also DFA)
regex appearance 162
text-to-HTML example 67-77
\p{Thai} 122
then (see conditional)
theory of an NFA 180
There’s more than one way to do it 349
this|that example 133, 139, 243, 245-247, 252, 255, 260-261
thread scheduling Java benchmarking 236
\p{Tibetan} 124
tied variables 299
time() 232
time of day 26
Time.new 238
Timer() 237
timezone PHP 235
title case 110
\p{Titlecase_Letter} 123
TiVo 3
building 315
toMatchResult method 377
toothpicks scattered 101
tortilla 128
ToString
Group object method 430
Match object method 427
Regex object method 427
toString method 393-394
Traditional NFA testing for 146-147
trailing context 182
transmission (see also \G)
optimizations 246-247
transparent bounds 387
Java 387
Tubby 265
typographical conventions xxi
\U 117
\U···\E 290
inhibiting 292
uc 290
U+C0B5 107
ucfirst 290
UCS-2 encoding 107
UCS-4 encoding 107
Ullman, Jeffrey 180
Perl 288
unconditional caching 350
underscore in \w history 89
Unescape 433
Unicode
block 124
.NET 407
Perl 288
categories (see Unicode, properties)
character
code point
beyond U+FFFF 109
introduced 107
multiple 108
unassigned in block 124
combining character 107, 120, 122
Java 370
loose matching (see case-insensitive mode)
.NET 407
official web site 127
overview 106-110
Perl 288
(see also \p{···})
Java 368
list 122-123
Perl 288
PHP 442
script 122
Perl 288
PHP 442
\w 120
whitespace and /x 288
UnicodeData.txt 290
unicore 290
.* 165
atomic grouping 171
unrolling the loop 261-276
general pattern 264
\p{Uppercase_Letter} 123
URL encoding 320
URL example 74-77, 201-204, 208, 260, 303-304, 306, 320, 450-451
egrep 25
Java 209
.NET 204
plucking 206-208
use charnames 290
use English 357
use overload 342
(see also regex overloading)
use re 'eval’ 337
useAnchoringBounds method 388
plucking from text 71-73
in URL 74-77
useTransparentBounds method 387
using System.Text.RegularExpressions 416
UTF-16 encoding 107
\V 364
Value
Group object method 430
Match object method 427
variable names example 24
variables
after match
pre-match copy 355
binding 339
fully qualified 295
interpolation 344
naughty 356
tied 299
VB.NET xvii
comments 99
regex approach 96-97
strings 103
(see also .NET)
verbatim strings 103
Version 7 regex 183
Version 8 regex 183
version covered
Java 365
.NET 405
Perl 283
PHP 440
others 91
version history Java 365, 368-369, 392, 401
Perl \s 288
vi after-match data 138
Vietnamese text processing 29
virtual machine 236
Visual Basic xvii
(see also VB.NET)
(see also .NET)
Visual Studio .NET 434
VM 236
Java 236
warming up 236
void context 294
$^W 297
Emacs 129
Java 368
many different interpretations 93
Perl 288
warming up Java VM 236
warnings 296
($^W variable)
Perl 297
Perl 38
temporarily turning off 297
use warnings
while vs. foreach vs. if 320
whitespace
allowing optional 18
removing 199-200
width attribute Java example 397
wildcards filename 4
word anchor mechanics of matching 150
word boundaries 133
\<···\>
egrep 15
introduced 15
Java 134
many programs 134
.NET 134
Perl 288
PHP 134
www.cpan.org 358
www.PeakWebhosting.com xxiv
www.regex.info 358
(see also: comments and free-spacing mode; mode modifier)
history 90
introduced 72
(?x) (see: comments and free-spacing mode; mode modifier)
Perl 286
XML 483
CDATA 483
XML example 481-484
-y old grep 86
¥ 124
Yahoo! xxiv, 74, 132, 190, 206-207, 258, 314, 397
(see also enhanced line-anchor mode)
Java 370
optimization 246
(see also enhanced line-anchor mode)
optimization 246
PHP 442
Zawodny, Jeremy 258
zero-width assertions (see: anchor; lookahead; lookbehind)
ZIP code example 209-212
\p{Zl} 123
\p{Zp} 123
\p{Zs} 123