Regular Expressions Cookbook, 2nd Edition
by Steven Levithan
Published by
O'Reilly Media, Inc., 2012
and
Tags
| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
if re.match(r"^(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-↵
(?!0000)[0-9]{4}$", sys.argv[1]):
print "SSN is valid"
else:
print "SSN is invalid"See Recipe 3.6 for help with implementing this regular expression with other programming languages.
United States Social Security numbers are nine-digit numbers in
the format AAA-GG-SSSS:
The first three digits were historically (prior to mid-2011) assigned by geographical region, and are thus called the area number. The area number cannot be 000, 666, or between 900 and 999.
Digits four and five are called the group number and range from 01 to 99.
The last four digits are serial numbers from 0001 to 9999.
This recipe follows all of the rules just listed. Here’s the regular expression again, this time explained piece by piece:
^ # Assert position at the beginning of the string.
(?!000|666) # Assert that neither "000" nor "666" can be matched here.
[0-8] # Match a digit between 0 and 8.
[0-9]{2} # Match a digit, exactly two times.
- # Match a literal "-".
(?!00) # Assert that "00" cannot be matched here.
[0-9]{2} # Match a digit, exactly two times.
- # Match a literal "-".
(?!0000) # Assert that "0000" cannot be matched here.
[0-9]{4} # Match a digit, exactly four times.
$ # Assert position at the end of the string.| Regex options: Free-spacing |
| Regex flavors: .NET, Java, XRegExp, PCRE, Perl, Python, Ruby |
Apart from the ‹^›
and ‹$› tokens that assert
position at the beginning and end of the string, this regex can be
broken into three sets of digits separated by hyphens. The first set
allows any number from 000 to 899, but uses the preceding negative
lookahead ‹(?!000|666)› to
rule out the specific values 000 and 666. This kind of restriction can
be pulled off without lookahead, but having this tool in our arsenal
dramatically simplifies the regex. If you wanted to remove 000 and 666
from the range of valid area numbers without using any sort of
lookaround, you’d need to restructure ‹(?!000|666)[0-8][0-9]{2}› as ‹(?:00[1-9]|0[1-9][0-9]|[1-578][0-9]{2}|6[0-57-9][0-9]|66[0-57-9])›.
This far less readable approach uses a series of numeric ranges, which
you can read all about in Recipe 6.7.
The second and third sets of digits in this pattern simply match any two- or four-digit number, respectively, but use a preceding negative lookahead to rule out the possibility of matching all zeros.
If you’re searching for Social Security numbers in a
larger document or input string, replace the ‹^› and ‹$› anchors with word boundaries. Regular
expression engines consider all alphanumeric characters and the
underscore to be word characters.
\b(?!000|666)[0-8][0-9]{2}-(?!00)[0-9]{2}-(?!0000)[0-9]{4}\b| Regex options: None |
| Regex flavors: .NET, Java, JavaScript, PCRE, Perl, Python, Ruby |
The Social Security Number Verification Service (SSNVS) at http://www.socialsecurity.gov/employer/ssnv.htm offers two ways to verify over the Internet that names and Social Security numbers match the Social Security Administration’s records.
Techniques used in the regular expressions in this recipe are discussed in Chapter 2. Recipe 2.3 explains character classes. Recipe 2.5 explains anchors. Recipe 2.6 explains word boundaries. Recipe 2.8 explains alternation. Recipe 2.9 explains grouping. Recipe 2.12 explains repetition. Recipe 2.16 explains lookaround.