Regular Expressions are a very powerful and very complex method for describing matches for a pattern of text. Regular expressions - also known as regexes, grep expressions or grep patterns - use a set of special characters to describe a template that can match a variety of text patterns. This lets you do "wildcard" matches that can, say, set a trigger that will work on all text paged from you or to you, by looking for the common bits in the responses from your muck's page
function.
A full overview of regular expressions is beyond what I can do here, but here's a basic list of the main building blocks, and some examples.
Fred
as your regular expression.
Several characters are used by regular expressions to mean special things; this includes all of the ones listed below. If you actually need to look for one of these characters in your search pattern, you can use a backslash (\) preceding the character to specify that character. For example, to look for the text "stop.", including the period, use the regular expression "stop\."; to match a backslash itself, use "\\".
.
: The period will match any single character. So to match either "too" or "two", you can use the regular expression t.o
.^
: The caret will match the beginning of a line of text. So to catch actions from a puppet, where the muck uses the name of the puppet at the beginning of each line from the puppet ("Windir", for example), you can use the regular expression ^Windir
.$
: The dollar sign will match the end of a line.[]
: A list of characters enclosed by square brackets counts as a single item that will match any single character in that list. For example, [123abc]
will match a single "1", "2", "3", "a", "b" or "c". Putting a caret (^
) at the start of the list will match any single character that is not in the list; [^abc]
will match any single character except "a", "b" or "c". To match a range of ASCII characters, use a dash; thus [a-z]
will match any lowercase letter, [0-9]
will match any digit, and [a-zA-Z0-9]
will match any alphanumeric character.()
: A portion of a regular expression enclosed in parentheses will be counted as a single item, or subexpression, for use with both repetition modifiers (see below) and for the |
operator. You can also refer to subexpressions when doing substitution (see below).|
: The vertical bar joins two items and will match either one of them. Thus [1-4]|[7-9]
will match the digits 1 through 4 or the digits 7 through 9.Matching a single character can sometimes be useful, but matching a repeated series of characters is more so.
*
: The asterisk will match zero or more examples of the preceding item. So " *
" (space followed by an asterisk) will match any number of spaces - useful, for example, if you're trying to pick out the first item of an indented table but don't know how many spaces it will be indented.+
: The plus sign will match one or more examples of the preceding item. Works just like *
, except that there must be at least one example to be matched. Here's an example:
^ *Windir
" will match "Windir", " Windir", or " Windir" at the beginning of a line.^ +Windir
" will match " Windir" or " Windir" at the beginning of a line, but will not match "Windir", because the plus sign requires at least one space before "Windir"; the asterisk does not.{number}
after an item will match <number> examples of that item. So [a-zA-Z0-9]{5}
will match exactly five alphanumeric characters. {X,Y}
will match anywhere from X to Y examples of that item; so T{5,7}
will match 5, 6 or 7 "T"s in a row.?
: The question mark matches zero or one example of the preceding item, but no more. So colou?r
will match "color" or "colour".Cantrip triggers allow you to substitute your own text for the incoming text that tripped the trigger. By using parentheses to group parts of your regular expression into subexpressions, you can use them as part of your replacement text; use a double percent sign (%%
) plus the number of the subexpression you want to use in your substituted text. This is called a backreference. For example:
Take the following regular expression example:
^.*> .*$
...which matches an entire line of text from a puppet. (The first .*
catches the characters between the start of the line and the > that marks the end of the puppet's name; the second .*
catches the rest of the text to the end of the line.) If you then break up the regex into subexpressions with parentheses, as follows:
(^)(.*)(> )(.*)($)
you can use %%2
to backreference the second subexpression in the expression - the name of the puppet - in your substituted text. %%4
will give you the text the puppet received. So for the following line:
Frito> Akemi says, "That would be telling."
using this regular expression for your substituted text:
Your puppet %%2 reported, -= %%4 =-
will give you
Your puppet Frito reported, -= Akemi says, "That would be telling." =-
There are far more things you can do with regular expressions, but this should be enough to cover most uses in Cantrip.
^Windir.*$
: Matches an entire line of text that begins with "Windir". (^You page, )|(^.*pages, )
will match either the start of a line you page (the first parenthetical expression) or paged to you (the second parenthetical expression) on a muck I frequent. [Editor's note: This seems logical, but didn't actually work when I tried using it as the Trigger On text.]^.*>.*$
matches the entire line of text from any puppet, when the muck uses the puppet name followed by a ">" at the start of every line from a puppet.^Frito>.*$
would match only entire lines from the puppet named Frito; you can use this as the Trigger on text for a trigger to highlight the entire line from Frito.(Note that not all of these references work with the specific regular expression dialect that Apple's regex library uses. I am still looking for documentation on Apple's dialect, and will list the differences below when I find them.)
grep
Command In Mac OS X, have a good regular expression tutorial.man re_format
manual entry will give you the BSD Unix dialect for regular expressions.\1
", "\2
", "\3
", and so forth to make backreferences instead of "%%1
", "%%2
", and "%%3
".