Regular Expressions Cookbook, 2nd Edition
by Steven Levithan
Published by
O'Reilly Media, Inc., 2012
and
Tags
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9 |
\A (?P<drive>[a-z]:)\\ (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?P<file>[^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: PCRE 4 and later, Perl 5.10, Python |
\A ([a-z]:)\\ ((?:[^\\/:*?"<>|\r\n]+\\)*) ([^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([a-z]:)\\((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)$
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
\A (?<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9 |
\A (?P<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?P<file>[^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: PCRE 4 and later, Perl 5.10, Python |
\A ([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ ((?:[^\\/:*?"<>|\r\n]+\\)*) ([^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\((?:[^\\/:*?"<>|\r\n]+\\)*)↵ ([^\\/:*?"<>|\r\n]*)$
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
These regular expressions can match the empty string. See the section for more details and an alternative solution.
\A (?<drive>[a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?) (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9 |
\A (?P<drive>[a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?) (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?P<file>[^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: PCRE 4 and later, Perl 5.10, Python |
\A ([a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?) ((?:[^\\/:*?"<>|\r\n]+\\)*) ([^\\/:*?"<>|\r\n]*) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^([a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?)↵ ((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)$
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
The regular expressions in this recipe are very similar to the ones in the previous recipe. This discussion assumes you’ve already read and understood the discussion of the previous recipe.
We’ve made only one change to the regular expressions
for drive letter paths, compared to the ones in the previous recipe.
We’ve added three capturing groups that you can use to retrieve the
various parts of the path: ‹drive›, ‹folder›, and ‹file›. You can use these names if your regex
flavor supports named capture (Recipe 2.11). If
not, you’ll have to reference the capturing groups by their numbers:
1, 2, and 3. See Recipe 3.9 to learn
how to get the text matched by named and/or numbered groups in your
favorite programming language.
Things get a bit more complicated if we also want to allow relative paths. In the previous recipe, we could just add a third alternative to the drive part of the regex to match the start of the relative path. We can’t do that here. In case of a relative path, the capturing group for the drive should remain empty.
Instead, the literal backslash that was after the capturing group for the drives in the regex in the “drive letter and UNC paths” section is now moved into that capturing group. We add it to the end of the alternatives for the drive letter and the network share. We add a third alternative with an optional backslash for relative paths that may or may not begin with a backslash. Because the third alternative is optional, the whole group for the drive is essentially optional.
The resulting regular expression correctly matches all Windows paths. The problem is that by making the drive part optional, we now have a regex in which everything is optional. The folder and file parts were already optional in the regexes that support absolute paths only. In other words: our regular expression will match the empty string.
If we want to make sure the regex doesn’t match empty strings, we’d have to add additional alternatives to deal with relative paths that specify a folder (in which case the filename is optional), and relative paths that don’t specify a folder (in which case the filename is mandatory):
\A (?: (?<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) | (?<relativefolder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+) (?<file2>[^\\/:*?"<>|\r\n]*) | (?<relativefile>[^\\/:*?"<>|\r\n]+) ) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9 |
\A (?: (?P<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?P<file>[^\\/:*?"<>|\r\n]*) | (?P<relativefolder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+) (?P<file2>[^\\/:*?"<>|\r\n]*) | (?P<relativefile>[^\\/:*?"<>|\r\n]+) ) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: PCRE 4 and later, Perl 5.10, Python |
\A (?: ([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ ((?:[^\\/:*?"<>|\r\n]+\\)*) ([^\\/:*?"<>|\r\n]*) | (\\?(?:[^\\/:*?"<>|\r\n]+\\)+) ([^\\/:*?"<>|\r\n]*) | ([^\\/:*?"<>|\r\n]+) ) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby |
^(?:([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\↵ ((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)|(\\?(?:[^\\/:*?"<>|↵ \r\n]+\\)+)([^\\/:*?"<>|\r\n]*)|([^\\/:*?"<>|\r\n]+))$
| Regex options: Case insensitive |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python |
The price we pay for excluding zero-length strings is that we now have six capturing groups to capture the three different parts of the path. You’ll have to look at the scenario in which you want to use these regular expressions to determine whether it’s easier to do an extra check for empty strings before using the regex or to spend more effort in dealing with multiple capturing groups after a match has been found.
When using Perl 5.10, Ruby 1.9, or .NET, we can give multiple named groups the same name. See the section Groups with the same name in Recipe 2.11 for details. This way we can simply get the match of the folder or file group, without worrying about which of the two folder groups or three file groups actually participated in the regex match:
\A (?: (?<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) | (?<folder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+) (?<file>[^\\/:*?"<>|\r\n]*) | (?<file>[^\\/:*?"<>|\r\n]+) ) \Z
| Regex options: Free-spacing, case insensitive |
| Regex flavors: .NET, Perl 5.10, Ruby 1.9 |
Recipe 8.18 validates a Windows path using simpler regular expressions without separate capturing groups for the drive, folder, and file.
Recipe 3.9 shows code to get the text matched by a particular part (capturing group) of a regex. Use this to get the parts of the path you’re interested in.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.1 explains which special characters need to be escaped. Recipe 2.2 explains how to match nonprinting characters. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.11 explains named capturing groups. Recipe 2.12 explains repetition.