Matching One of Several Characters
As you have learned previously, . matches any one character (as does any literal character). But what if there was a file (containing Canadian sales data) named ca1.xls as well with na1.xls and sa1.xls, and you only wanted to match only na and sa?. would also match c,and so that filename would also be matched. To find n or s you would not want to match any character,you would want to match just those two characters. In regular expressions a set of characters is defined using the metacharacters [ and ]. [ and ] define a character set, everything between them is part of the set,and any one of the set members must match (but not all).
Character sets are frequently used to make searches (or specific parts thereof) not case sensitive. For example:
TEXT
The phrase “regular expression” is often abbreviated as RegEx or regex
REGEX
[Rr]eg[Ee]x
RESULT
The phrase “regular expression” is often
abbreviated as RegEx or regex
Using Character Set Ranges
We will use [a-z],[A-Z],[0-9] for character ranges. finding RGB values (colors specified in a hexadecimal notation representing the amount of red,green, and blue used to create the color). In Web pages,RGB values are specified as #000000 (black), #FFFFFF (white), #FF0000 (red),and so on. RGB values may be specified in uppercase or lowercase, and so #FF00ff (magenta) is legal,too. Here’s the example:
TEXT
<BODY BGCOLOR=”#336633” TEXT=”#FFFFFF” MARGINWIDTH=”0” MARGINHEIGHT=”0” TOPMARGIN=”0” LEFTMARGIN=”0”>
RegEx
#[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]
RESULT
<BODY BGCOLOR=”#336633” TEXT=”#FFFFFF”
MARGINWIDTH=”0” MARGINHEIGHT=”0”
TOPMARGIN=”0” LEFTMARGIN=”0”>
“Anything But”Matching
Character sets are usually used to specify a list of characters of which any must match. But occasionally,you’ll want the reverse—a list of characters that you don’t want to match. In other words, anything but the list specified here. Rather than having to enumerate every character you want (which could get rather lengthy if you want all but a few),character sets can be negated using the ^ metacharacter. Here’s an example:
TEXT
sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls
RegEx
[ns]a[^0-9]\.xls
RESULT
sales1.xls
orders3.xls
sales2.xls
sales3.xls
apac1.xls
europe2.xls
sam.xls
na1.xls
na2.xls
sa1.xls
ca1.xls
Summary
Metacharacters [ and ] are used to define sets of characters,any one of which must match (OR in contrast to AND). Character sets may be enumerated explicitly or specified as ranges using the – metacharacter. Character sets may be negated using ^; this forces a match of anything but the specified characters.