This article gives a short overview of regular expressions, their use in DropStream, and a convenient "cheat sheet".
Regular expressions are patterns of text strings where characters of the same type are represented by a unique symbol or "metacharacter". Strings of such symbols can be used to represent many actual text strings, and in this way speed up text search, comparison, replacement and deletion, without having to spell out all character strings individually.
For example, using the symbol
\d to represent any single digit, we can find all orders where the phone number is in the US 212 area code, with the regular expression
212-\d\d\d\d\d\d\d, without having to type in all possible combinations:
Many different symbolic representations of text strings can be devised but regular expressions have become a popular formal standard that has been in use since 1950s, and included in slight variations in all major programming languages.
In the DropStream Advanced rule editor, regular expressions are placed between forward slashes (/.../), e.g.
/212-\d\d\d\d\d\d\d/. In the GUI Editor, forward slashes are added automatically, so you should not include them.
Characters that are used as symbols for other characters (see the table below), need to be annotated with a preceding left slash (\) as an "escape character" when they are not used as symbols but as literal characters, so that the processing engine knows which sense you have in mind, e.g. if you would like to find strings that contain a question mark, then you should represent the question mark as
Right slashes cannot be used inside DropStream regular expressions even with escape characters, including them will trigger an error.
Regex cheat sheet
The table below includes symbols that are used in regular expressions most frequently. All symbols can be combined with other symbols to create more complex strings.
|Symbol||Description||Example regex||Matched string|
||One digit from 0 to 9||
||one character that is not a digit||
||One letter, digit or underscore||
||one character that is not a letter, digit or underscore||
||One whitespace character (space, tab, newline)||
||any character, except whitespace||
||Any character, except line break||
||The preceding symbol 1 or more times||
||The preceding symbol 0 or 1 time||
||The preceding symbol 0 or more times||
||The preceding symbol occurring minimum m and maximum n times.
(m can be 0; n can be omitted)
||Separates alternatives (OR)||
||A string of symbols as a single element. The symbols in parentheses can be referenced later by positional variables
||A set of possible characters. A dash (
||The beginning of a string or a line||
||The end of a string or a line||
||This is the end|
For more examples of regular expressions and for a more in-depth discussion, see the Wikipedia article.