Regular Expressions Cookbook, 2nd Edition
by Steven Levithan
Published by
O'Reilly Media, Inc., 2012
and
Tags
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
We use a capturing group to separate a number from its
leading zeros. Before the group, ‹0*› matches the leading zeros, if any. Within the
group, ‹[1-9][0-9]*›
matches a number that consists of one or more digits, with the first
digit being nonzero. The number can begin with a zero only if the number
is zero itself. The word boundaries make sure we don’t match partial
numbers, as explained in Recipe 6.1.
To get a list of all numbers in the subject text without leading zeros, iterate over the regex matches as explained in Recipe 3.11. Inside the loop, retrieve the text matched by the first (and only) capturing group, as explained in Recipe 3.9. The solution for this shows how you could do this in Perl.
Stripping the leading zeros is easy with a search-and-replace. Our regex has a capturing group that separates the number from its leading zeros. If we replace the overall regex match (the number including the leading zeros) with the text matched by the first capturing group, we’ve effectively stripped out the leading zeros. The solution shows how to do this in PHP. Recipe 3.15 shows how to do it in other programming languages.
All the other recipes in this chapter show more ways of matching different kinds of numbers with a regular expression.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.6 explains word boundaries. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition.