Regular Expressions Cookbook, 2nd Edition
by Steven Levithan
Published by
O'Reilly Media, Inc., 2012
and
Tags
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Accurate regex to check for an IP address, allowing leading zeros:
^(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}↵
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Accurate regex to check for an IP address, disallowing leading zeros:
^(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}↵
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Simple regex to extract IP addresses from longer text:
\b(?:[0-9]{1,3}\.){3}[0-9]{1,3}\b| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Accurate regex to extract IP addresses from longer text, allowing leading zeros:
\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}↵
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Accurate regex to extract IP addresses from longer text, dis25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]allowing leading zeros:
\b(?:(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.){3}↵
(?:25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\b| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Simple regex that captures the four parts of the IP address:
^([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})\.([0-9]{1,3})$| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Accurate regex that captures the four parts of the IP address, allowing leading zeros:
^(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.↵ (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.↵ (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.↵ (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)$
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
Accurate regex that captures the four parts of the IP address, disallowing leading zeros:
^(25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.↵ (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.↵ (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])\.↵ (25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9])$
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
A version 4 IP address is usually written in the form 255.255.255.255, where each of the four numbers must be between 0 and 255. Matching such IP addresses with a regular expression is very straightforward.
In the solution, we present four regular expressions. Two of them are billed as “simple,” while the other two are marked “accurate.”
The simple regexes use ‹[0-9]{1,3}› to match each of the four blocks of
digits in the IP address. These actually allow numbers from 0 to 999
rather than 0 to 255. The simple regexes are more efficient when you
already know your input will contain only valid IP addresses, and you
only need to separate the IP addresses from the other stuff.
The accurate regexes use ‹25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?› to match
each of the four numbers in the IP address. This regex accurately
matches a number in the range 0 to 255, with one optional leading zero
for numbers between 10 and 99, and two optional leading zeros for
numbers between 0 and 9. ‹25[0-5]› matches 250 through 255, ‹2[0-4][0-9]› matches 200 to 249,
and ‹[01]?[0-9][0-9]?›
takes care of 0 to 199, including the optional leading zeros. Recipe 6.7 explains in detail how to match numeric
ranges with a regular expression.
While many applications accept IP addresses with leading zeros,
strictly speaking leading zeros are not allowed in IPv4 addresses. We
can enhance the regexes to use ‹25[0-5]|2[0-4][0-9]|1[0-9][0-9]|[1-9]?[0-9]› to
match a number in the range 0 to 255, without leading zeros. The numbers
200 to 255 are matched in the same way. Instead of using just ‹[01]?[0-9][0-9]?› to match the
range 0 to 99, we now use ‹1[0-9][0-9]|[1-9]?[0-9]› with two separate
alternatives. ‹1[0-9][0-9]› matches the range 100 to 199.
‹[1-9]?[0-9]› matches the
range 0 to 99. By making the leading digit optional, we can use a single
alternative to match both the single digit and double digit
ranges.
If you want to check whether a string is a valid IP address in its
entirety, use one of the regexes that begin with a caret and end with a
dollar. These are the start-of-string and end-of-string anchors,
explained in Recipe 2.5. If you want to find IP
addresses within longer text, use one of the regexes that begin and end
with the word boundaries ‹\b› (Recipe 2.6).
The first four regular expressions use the form ‹(?:number\.){3}number›. The first
three numbers in the IP address are matched by a noncapturing group
(Recipe 2.9) that is repeated three times (Recipe 2.12). The group matches a number and a literal
dot, of which there are three in an IP address. The last part of the
regex matches the final number in the IP address. Using the noncapturing
group and repeating it three times makes our regular expression shorter
and more efficient.
To convert the textual representation of the IP address into an integer, we need to capture the four numbers separately. The last two regexes in the solution do this. Instead of using the trick of repeating a group three times, they have four capturing groups, one for each number. Spelling things out this way is the only way we can separately capture all four numbers in the IP address.
Once we’ve captured the number, combining them into a
32-bit number is easy. In Perl, the special variables $1, $2,
$3, and $4 hold the text matched by the four capturing
groups in the regular expression. Recipe 3.9 explains how to retrieve capturing
groups in other programming languages. In Perl, the string variables for
the capturing groups are automatically coerced into numbers when we
apply the bitwise left shift operator (<<) to them. In other languages, you may
have to call String.toInteger() or
something similar before you can shift the numbers and combine them with
a bitwise or.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.1 explains which special characters need to be escaped. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.6 explains word boundaries. Recipe 2.9 explains grouping. Recipe 2.8 explains alternation. Recipe 2.12 explains repetition.