A large number of useful variables that can be used for tasks such as constructing filenames are provided by default.  For even greater flexibility, it is possible to modify the variable’s value using one or more regular expression substitution patterns. Regular expressions are commonly used in Unix and other scripting environments and provide a lot of flexibility, at the cost of a rather arcane syntax.

It is recommended that the reader gains some familiarity with regular expressions from one of the excellent on-line resources (for example RegexOne) before using Regex in CatDV Worker.

 

Regex Substitution

To use a regular expression substitution you should follow the variable with curly braces. The syntax is

$variable ‘{‘ ‘s’ ‘/’ search-pattern ‘/’ replacement-pattern ‘/’ ‘}’.

For example, if  $x has the value “abcde” then $x{s/b/x} would be “axcde”.

 

Multiple Substitutions

The regular expression features provided are similar to those provided by Perl. Only the ‘s’ command is supported, but multiple commands can be included within the braces, eg. ${U1}{s/find1/replace1/s/find2/repalace2/} to replace find1 with replace1 and find2 with replace2 in user field 1.

Any delimiter can be used after the ‘s’ instead of ‘/’, and the final delimiter before the ‘}’ may be omitted. The command may be followed by ‘g’ to apply the substitution globally to all occurrences (by default only the first occurrence is replaced) or ‘i’ to perform a case-insensitive match (by default it is case sensitive).

 

Capturing and reusing parts of the search

Within the replacement pattern any sub-elements of the search pattern enclosed within (round) parentheses can be ‘captured’ as ‘$1’ for the first sub-element, ‘$2’ for the second, and so on, so that $a{s/D(d*).*/$1/} would extract the first group of digits from $a. Other worker node variables can be included in the regular expression command, either within the search or replacement pattern. For example, $f{s/$_y//} would give you the value of $f with any occurrence of the variable $_y removed.

Worked Example

As a more complete example, let’s say we want to pick the first part of the parent folder name $p. Let’s consider how is the following statement, which looks like someone sneezed on the screen, processed:

$p{s,^.*/(S+)[^/]*$,$1,}

The curly braces mean take $p and modify it. What modification? The character after s is comma, so that’s the delimiter and we’re replacing XXX with YYY, whatever XXX and YYY may be: $p{s,XXX,YYY,}

XXX, the pattern we’re searching for, is ^.*/(S+)[^/]*$  which can be broken down as follows:

^ = start of string/start of line

$ = end of string/end of line

. = any character

* = previous pattern repeated any number (0 or more) times

/ = literal forward slash (if we’re processing file paths on a Mac, use \ instead if on Windows)

S = any non-space character (s = any white space character)

+ = previous pattern should occur 1 or more times

[..] = a list or range of characters to match

[^..] = negate the set of characters, ie. match any character not in the list

(..) = parentheses surround the part we’re interested in

[^/]* = any combination of characters other than forward slash.

Combining all the above, this means take the variable $p, then try to match anything from the start (^.*), followed by a slash (/), followed by one or more non-space characters (S+), followed by any number of characters excluding / ([^/]*), all the way up to the end of the string ($). All of this is replaced by $1.

We have to insist on there being a number of non-/ characters up to the end of the line after the text we’re looking for as that’s how we make sure we’re picking up characters after the last forward slash, rather the first one.

The parentheses round the S+ mean that we can refer to the characters matched by that expression later. That’s what the $1 in the substitution pattern refers to, ie. we’re matching the whole string and replacing it with just those non-space characters after the final /.

More Information

You can look up documentation for the ‘java.util.regex.Pattern’ class online for full details of the regular expression syntax that is supported, or press the ‘Help’ button in the watch item editor to see some examples. There are also a number of online tutorials and guides that describe regular expressions.