We've now seen many examples of how to use regular expressions. While most things are pretty intuitive, we have also seen that if we want to filter for both uppercase and lowercase strings, we'd either have to specify the -i option for grep, or change the search pattern from [a-z] to [a-zA-z]. For numbers, we would need to use [0-9].
Some might find this fine to work with, but others might disagree. In this case, there is an alternative notation that can be used: [[:pattern:]].
The next example uses both this new double bracket notation, and the old single bracket one:
reader@ubuntu:~/scripts/chapter_10$ grep [[:digit:]] character-class.txt
e2e
a2a
reader@ubuntu:~/scripts/chapter_10$ grep [0-9] character-class.txt
e2e
a2a
As you can see, both patterns result in the same lines: those with a digit. The same can be done with uppercase characters:
reader@ubuntu:~/scripts/chapter_10$ grep [[:upper:]] grep-file.txt
We can use this regular file for testing grep.
Regular expressions are pretty cool
Did you ever realise that in the UK they say colour,
but in the USA they use color (and realize)!
Also, New Zealand is pretty far away.
reader@ubuntu:~/scripts/chapter_10$ grep [A-Z] grep-file.txt
We can use this regular file for testing grep.
Regular expressions are pretty cool
Did you ever realise that in the UK they say colour,
but in the USA they use color (and realize)!
Also, New Zealand is pretty far away.
At the end of the day, it is a matter of preference which notation you use. There is one thing to be said for the double bracket notation, though: it is much closer to implementations of other scripting/programming languages. For example, most regular expression implementations use \w (word) to select letters, and \d (digit) to search for digits. In the case of \w, the uppercase variant is intuitively \W.
For your convenience, here is a table with the most common POSIX double-bracket character classes:
|
Notation |
Description |
Single bracket equivalent |
|
[[:alnum:]] |
Matches lowercase and uppercase letters or digits |
[a-z A-Z 0-9] |
|
[[:alpha:]] |
Matches lowercase and uppercase letters |
[a-z A-Z] |
|
[[:digit:]] |
Matches digits |
[0-9] |
|
[[:lower:]] |
Matches lowercase letters |
[a-z] |
|
[[:upper:]] |
Matches uppercase letters |
[A-Z] |
|
[[:blank:]] |
Matches spaces and tabs |
[ \t] |