Regular Expressions in Order Rules

This article gives a short overview of regular expressions, their use in DropStream, and a convenient "cheat sheet".

Contents

Overview

Regular expressions are patterns of text strings where characters of the same type are represented by a unique symbol or "metacharacter". Strings of such symbols can be used to represent many actual text strings, and in this way speed up text search, comparison, replacement and deletion, without having to spell out all character strings individually.

For example, using the symbol \d to represent any single digit, we can find all orders where the phone number is in the US 212 area code, with the regular expression 212-\d\d\d\d\d\d\d, without having to type in all possible combinations: 

  • 212-1111111 
  • 212-1111112 
  • 212-1111113
  • ... 
  • 212-9999999,

omitted here for brevity.

Many different symbolic representations of text strings can be devised but regular expressions have become a popular formal standard that has been in use since 1950s, and included in slight variations in all major programming languages.

DropStream conventions

In DropStream, regular expressions are used in order rules with the matches comparison operator to compare attribute values, and to assign attribute values to textual attributes.

In the DropStream Advanced rule editor, regular expressions are placed between forward slashes (/.../), e.g. /212-\d\d\d\d\d\d\d/. In the GUI Editor, forward slashes are added automatically, so you should not include them.

Characters that are used as symbols for other characters (see the table below), need to be annotated with a preceding left slash (\) as an "escape character" when they are not used as symbols but as literal characters, so that the processing engine knows which sense you have in mind, e.g. if you would like to find strings that contain a question mark, then you should represent the question mark as \?.

Right slashes cannot be used inside DropStream regular expressions even with escape characters, including them will trigger an error.

Regex cheat sheet

The table below includes symbols that are used in regular expressions most frequently. All symbols can be combined with other symbols to create more complex strings. 

Symbol Description Example regex Matched string example
\d

One digit from 0 to 9

\d\d\d

212

 

\D = one character that is not a digit (the opposite)

\D\D\D aBc
\w

One letter, digit or underscore

 \w\w\w a_1
 

\W = one character that is not a  letter, digit or underscore

\W\W\W +-)
\s

One whitespace character (space, tab, newline)

Hello\sworld Hello world
  \S = any character, except whitespace \S\S\S\S\S Hello!
. Any character, except line break a.c abc
+ The preceding symbol 1 or more times \d+

1234

? The preceding symbol 0 or 1 time Ahaa? Ahaaa
* The preceding symbol 0 or more times Oo.* Oooh 
{m,n}

The preceding symbol min m and max n times

n can be omitted

  • {m}= exactly m times 
  • {m,} = at least m times 

m can be 0 

  • {0,n} = at most n times

\w{2,4}

 

\D{3}

a{2,}h

 

a{0,2}h

abcd

 

ABC

aaah

 

ah

|

Separates alternatives (OR)

2|3|4

4

() 

A string of symbols as a single element

The symbols in parentheses can be referenced later by positional variables \1, \2, etc.

 A(nt|pple)

He(l)\1(o) w\2r\1d

 Apple

Hello world

[…]

 

A set of possible characters

 

- = a range of characters

 

^ = negation, every character except those inside the brackets

[aeiou]

 

[A-Z]

 

[^AEIOU]

One lowercase vowel

One uppercase letter

One uppercase consonant

^ The beginning of a string or a line ^@[a-z]+ @jane
$ The end of a string or a line *? the end$ This is the end

For more examples of regular expressions and for a more in-depth discussion, see the Wikipedia article.

To test out your regular expressions, use one of the online regex testers, e.g. Rubular or Regex101.

Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.