Table of Contents for
Regular Expressions Cookbook, 2nd Edition

Cover image for bash Cookbook, 2nd Edition

Regular Expressions Cookbook, 2nd Edition by Steven Levithan Published by O'Reilly Media, Inc., 2012

Drive letter and UNC paths

\A (?<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9
\A (?P<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?P<file>[^\\/:*?"<>|\r\n]*) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python
\A ([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ ((?:[^\\/:*?"<>|\r\n]+\\)*) ([^\\/:*?"<>|\r\n]*) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\$(?:[^\\/:*?"<>|\r\n]+\$*)↵ ([^\\/:*?"<>|\r\n]*)$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Drive letter, UNC, and relative paths

Warning

These regular expressions can match the empty string. See the section for more details and an alternative solution.

\A (?<drive>[a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?) (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9
\A (?P<drive>[a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?) (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?P<file>[^\\/:*?"<>|\r\n]*) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python
\A ([a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?) ((?:[^\\/:*?"<>|\r\n]+\\)*) ([^\\/:*?"<>|\r\n]*) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^([a-z]:\\|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+\\|\\?)↵ ((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python

Discussion

The regular expressions in this recipe are very similar to the ones in the previous recipe. This discussion assumes you’ve already read and understood the discussion of the previous recipe.

Drive letter paths

We’ve made only one change to the regular expressions for drive letter paths, compared to the ones in the previous recipe. We’ve added three capturing groups that you can use to retrieve the various parts of the path: ‹drive›, ‹folder›, and ‹file›. You can use these names if your regex flavor supports named capture (Recipe 2.11). If not, you’ll have to reference the capturing groups by their numbers: 1, 2, and 3. See Recipe 3.9 to learn how to get the text matched by named and/or numbered groups in your favorite programming language.

Drive letter and UNC paths

We’ve added the same three capturing groups to the regexes for UNC paths.

Drive letter, UNC, and relative paths

Things get a bit more complicated if we also want to allow relative paths. In the previous recipe, we could just add a third alternative to the drive part of the regex to match the start of the relative path. We can’t do that here. In case of a relative path, the capturing group for the drive should remain empty.

Instead, the literal backslash that was after the capturing group for the drives in the regex in the “drive letter and UNC paths” section is now moved into that capturing group. We add it to the end of the alternatives for the drive letter and the network share. We add a third alternative with an optional backslash for relative paths that may or may not begin with a backslash. Because the third alternative is optional, the whole group for the drive is essentially optional.

The resulting regular expression correctly matches all Windows paths. The problem is that by making the drive part optional, we now have a regex in which everything is optional. The folder and file parts were already optional in the regexes that support absolute paths only. In other words: our regular expression will match the empty string.

If we want to make sure the regex doesn’t match empty strings, we’d have to add additional alternatives to deal with relative paths that specify a folder (in which case the filename is optional), and relative paths that don’t specify a folder (in which case the filename is mandatory):

\A (?: (?<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) | (?<relativefolder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+) (?<file2>[^\\/:*?"<>|\r\n]*) | (?<relativefile>[^\\/:*?"<>|\r\n]+) ) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java 7, PCRE 7, Perl 5.10, Ruby 1.9
\A (?: (?P<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?P<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?P<file>[^\\/:*?"<>|\r\n]*) | (?P<relativefolder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+) (?P<file2>[^\\/:*?"<>|\r\n]*) | (?P<relativefile>[^\\/:*?"<>|\r\n]+) ) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: PCRE 4 and later, Perl 5.10, Python
\A (?: ([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ ((?:[^\\/:*?"<>|\r\n]+\\)*) ([^\\/:*?"<>|\r\n]*) | (\\?(?:[^\\/:*?"<>|\r\n]+\\)+) ([^\\/:*?"<>|\r\n]*) | ([^\\/:*?"<>|\r\n]+) ) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Java, PCRE, Perl, Python, Ruby
^(?:([a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\↵ ((?:[^\\/:*?"<>|\r\n]+\\)*)([^\\/:*?"<>|\r\n]*)|(\\?(?:[^\\/:*?"<>|↵ \r\n]+\\)+)([^\\/:*?"<>|\r\n]*)|([^\\/:*?"<>|\r\n]+))$
Regex options: Case insensitive
Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python
The price we pay for excluding zero-length strings is that we now have six capturing groups to capture the three different parts of the path. You’ll have to look at the scenario in which you want to use these regular expressions to determine whether it’s easier to do an extra check for empty strings before using the regex or to spend more effort in dealing with multiple capturing groups after a match has been found.
When using Perl 5.10, Ruby 1.9, or .NET, we can give multiple named groups the same name. See the section Groups with the same name in Recipe 2.11 for details. This way we can simply get the match of the folder or file group, without worrying about which of the two folder groups or three file groups actually participated in the regex match:
\A (?: (?<drive>[a-z]:|\\\\[a-z0-9_.$●-]+\\[a-z0-9_.$●-]+)\\ (?<folder>(?:[^\\/:*?"<>|\r\n]+\\)*) (?<file>[^\\/:*?"<>|\r\n]*) | (?<folder>\\?(?:[^\\/:*?"<>|\r\n]+\\)+) (?<file>[^\\/:*?"<>|\r\n]*) | (?<file>[^\\/:*?"<>|\r\n]+) ) \Z
Regex options: Free-spacing, case insensitive
Regex flavors: .NET, Perl 5.10, Ruby 1.9

Table of Contents for
Regular Expressions Cookbook, 2nd Edition

8.19. Split Windows Paths into Their Parts

Problem

Solution

Drive letter paths

Drive letter and UNC paths

Drive letter, UNC, and relative paths

Warning

Discussion

Drive letter paths

Drive letter and UNC paths

Drive letter, UNC, and relative paths

See Also

Table of Contents for Regular Expressions Cookbook, 2nd Edition

8.19. Split Windows Paths into Their Parts

Problem

Solution

Drive letter paths

Drive letter and UNC paths

Drive letter, UNC, and relative paths

Warning

Discussion

Drive letter paths

Drive letter and UNC paths

Drive letter, UNC, and relative paths

See Also

Table of Contents for
Regular Expressions Cookbook, 2nd Edition