TCL based Regular Expressions
This section provides basic help for creating Regular Expressions based on the TCL "engine".
A Regular Expression is a pattern description using a "meta language", a language that you use to describe particular patterns of interest. The characters used in this "meta language" are part of the standard ASCII character set used in UNIX and MS-DOS, which can sometimes lead to confusion. The characters that form regular expressions are:
. | Matches any single character. |
[], ^, - | A character class which matches any character within the brackets. If the first character is a circumflex (" ^ ") it changes the meaning to match any character except those within the brackets. A dash (" - ") inside the square brackets indicates a character range, e.g., " [0-9] " means the same thing as " [0123456789] ", " [^A-Z] " matches any single character except A to Z upper case letters. |
\ | Except inside character classes (" [...] "), this makes the next character lose its special meaning, e.g., " \* " is a literal asterisk. |
When this appears just before a letter r, n or t: \r, \n, \t respectively matches a carriage return, a line feed, or an horizontal tab. | |
Only in Universal Analyzer description files (named xxxLanguagePattern.xml in folder $CAST_INSTALL\configuration\universal\xxx):
| |
[\r\n]+ | Use the expression [\r\n]+ to match one or many carriage returns (\r) and/or line feeds (\n). |
+ | Matches one or more occurrences of the preceding regular expression. For example: [0-9]+ matches " 1 ", " 111 ", or " 123456 " but not an empty string (if the plus sign were an asterisk, it would also match the empty string). |
* | Matches one or more occurrences of the preceding regular expression or an empty string. For example: [0-9]* matches " 1 ", " 111 ", " 123456 " or an empty string " ". |
? | Matches zero or one occurrence of the preceding regular expression. For example: -?[0-9]+ matches a signed number including an optional leading minus. |
| | Matches either the preceding regular expression or the following regular expression. For example: Cow|pig|sheep matches any of the three words. Note: Empty alternatives are disallowed. |
() | Groups a series of regular expressions together into a new regular expression. For example: (01) represents the character sequence 01. Parentheses are useful when building up complex patterns with *, +, ?, and |. |
Note that some of these operators operate on single characters (e.g., []) while others operate on regular expressions. Usually, complex regular expressions are built up from simple regular expressions. |
Examples of how to use Regular Expressions follow:
First, the regular expression for a " digit " is:
[0-9]
This can be used to build a regular expression for an integer:
[0-9]+
... for which at least one digit is required (this would have allowed no digits at all: [0-9]*)
Let's add an optional unary minus:
-?[0-9]+
This can then be expanded to allow decimal numbers. First, a decimal number can be specified (for the time being the last character will always be a digit):
[0-9]*\.[0-9]+
Note that the " \ " before the period will make it a literal period rather than a wild card character. This pattern matches " 0.0 ", " 4.5 ", or " .31415 ". However, it does not match " 0 " or " 2 ". In order to combine the definition to match them as well, simply leave out the unary minus, and use the following instead:
([0-9]+)|([0-9]*\.[0-9]+)
In this example, the grouping symbols " () " are used to specify what the regular expressions are for the " | " operation. If the unary minus is added :
-?(([0-9]+)|([0-9]*\.[0-9]+))
This can be furthered by allowing a float-style exponent to be specified as well. First, here's an example of a regular expression for an exponent:
[eE][-+]?[0-9]+
This matches an upper, or lowercase letter E, then an optional plus or minus sign, then a string of digits. For instance, this will match " e12 " or " E-3 ". This expression can then be used to build our final expression, one that specifies a real number:
-?(([0-9]+)|([0-9]*\.[0-9]+)([eE][-+]?[0-9]+)?)
Valid number: .65ea12
Specific example for the CAST Snapshot Preparation Assistant when creating a Module
When using the Snapshot Generation Assistant, it is possible to use Regular Expressions to automatically create your Modules (using the Match option against a Regular Expression.
In some cases it can be particularly useful to exclude certain objects from the Module. To do so using a Regular Expression, you can use the following syntax (this is simply an example):
To include all objects in the Module except those that match "Stoc<something>", use the following Regular Expression:
([^S]*)|(S[^t]*)|(St[^o]*)|(Sto[^c]*)