Regular Expressions Cookbook, 2nd Edition
by Steven Levithan
Published by
O'Reilly Media, Inc., 2012
and
Tags
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find any hexadecimal integer with optional underscores in a larger body of text:
\b0x[0-9A-F]+(_+[0-9A-F]+)*\b
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find any binary integer with optional underscores in a larger body of text:
\b0b[01]+(_+[01]+)*\b
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Find any decimal, octal, hexadecimal, or binary integer with optional underscores in a larger body of text:
\b([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)\b
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Check whether a text string holds just a decimal, octal, hexadecimal, or binary integer with optional underscores:
\A([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)\Z
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([0-9]+(_+[0-9]+)*|0x[0-9A-F]+(_+[0-9A-F]+)*|0b[01]+(_+[01]+)*)$
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
Recipes 6.1, 6.2, and 6.3 explain in detail how to match
integer numbers. These recipes do not allow underscores in the numbers.
Their regular expressions can easily use ‹[0-9]+›, ‹[0-9A-F]+›, and ‹[01]+› to match decimal, hexadecimal, and binary
numbers.
If we wanted to allow underscores anywhere, we could just add the
underscore to these three character classes. But we do not want to allow
underscores at the start or the end. The first and last characters in
the number must be a digit. You might think of ‹[0-9][0-9_]+[0-9]› as an easy solution. But this
fails to match single digit numbers. So we need a slightly more complex
solution.
Our solution ‹[0-9]+(_+[0-9]+)*› uses ‹[0-9]+› to match the initial digit or digits as
before. We add ‹(_+[0-9]+)*› to allow the digits to be followed by
one or more underscores, as long as those underscores are followed by
more digits. ‹_+› allows
any number of sequential underscores. ‹[0-9]+› allows any number of digits after the
underscores. We put those two inside a group that we repeat zero or more
times with a asterisk. This allows any number of nonsequential
underscores with digits in between them and after them, while also
allowing numbers with no underscores at all.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.6 explains word boundaries. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition.