Regular Expression Patterns
regular characters
All characters except ., |, (, ), [, \, ^, {, +, $, *, and ? match themselves. To match one of these characters, precede it with a backslash.
^
Matches the beginning of a line.
$
Matches the end of a line.
\A
Matches the beginning of the string.
\z
Matches the end of the string.
\Z
Matches the end of the string unless the string ends with a ``\n'', in which case it matches just before the ``\n''.
\b, \B
Match word boundaries and nonword boundaries respectively.
[ characters ]
A character class matches any single character between the brackets. The characters |, (, ), [, ^, $, *, and ?, which have special meanings elsewhere in patterns, lose their special significance between brackets. The sequences \ nnn, \x nn, \c x, \C- x, \M- x, and \M-\C- x have the meanings shown in Table 18.2 on page 203. The sequences \d, \D, \s, \S, \w, and \W are abbreviations for groups of characters, as shown in Table 5.1 on page 59. The sequence c1-c2 represents all the characters between c1 and c2, inclusive. Literal ] or - characters must appear immediately after the opening bracket. An uparrow (^) immediately following the opening bracket negates the sense of the match---the pattern matches any character that isn't in the character class.
\d, \s, \w
Are abbreviations for character classes that match digits, whitespace, and word characters, respectively. \D, \S, and \W match characters that are not digits, whitespace, or word characters. These abbreviations are summarized in Table 5.1 on page 59.
. (period)
Appearing outside brackets, matches any character except a newline. (With the /m option, it matches newline, too).
*
Matches zero or more occurrences of re.
+
Matches one or more occurrences of re.
{m,n}
Matches at least ``m'' and at most ``n'' occurrences of re.
?
Matches zero or one occurrence of re. The *, +, and {m,n} modifiers are greedy by default. Append a question mark to make them minimal.
re1 | re2
Matches either re1 or re2. | has a low precedence.
(...)
Parentheses are used to group regular expressions. For example, the pattern /abc+/ matches a string containing an ``a,'' a ``b,'' and one or more ``c''s. /(abc)+/ matches one or more sequences of ``abc''. Parentheses are also used to collect the results of pattern matching. For each opening parenthesis, Ruby stores the result of the partial match between it and the corresponding closing parenthesis as successive groups. Within the same pattern, \1 refers to the match of the first group, \2 the second group, and so on. Outside the pattern, the special variables $1, $2, and so on, serve the same purpose.