Click here to review an introductory tutorial to parsing.
The parse instruction has the following syntax.
parse [ upper ] valueToParse [ template ]
Where, valueToParse is one of the following: |
When the upper keyword is specified the valueToParse is converted to upper case. When this keyword is absent, the characters within the valueToParse are in an arbitrary character case.
The valueToParse is one or more strings that were passed to the current procedure as arguments. Parsing begins with the first argument string, When a comma is encountered within the template parsing proceeds with the next argument string, if one is available. If an argument is omitted, or unavailable, an empty string is parsed instead. The commas act as template separators. Between each comma, any arbitrary template can be specified.
The following is a simple example that shows how three procedure arguments can be obtained. Each template extracts two characters from each argument string.
/* main program */ call SUB 'abra', 'ca', 'dabra' say result /* shows: ab ca da */ return 0 sub : procedure parse arg magic1 +2 , magic2 +2 , magic3 +2 return magic1 magic2 magic3 |
The valueToParse is the next line from the default input stream.
The valueToParse can obtain the next line from a specific stream, named inputFile, as follows:
parse value linein( 'inputFile' ) with ... |
The valueToParse is obtained from the external data queue. If the data queue is empty, the next line from the default input stream is obtained instead.
The valueToParse is a string that summarizes source information regarding the current procedure. The content of the source information is implementation-dependent. The first word identifies the system or implementation. The second word identifies how the procedure was invoked, which is one of the following:
The remainder of the words in the source information are completely implementation-dependent.
The valueToParse is a string that is the resulting value of the optional expression. When the expression is absent, the value to parse is the empty string. In this context, the with keyword is reserved. Consequently, the source text of the expression can not contain the word: with.
The value [ expression ] with is commonly used. Here is an example that shows how 3 sub-values can be extracted from the date built-in function.
parse value date( 'Standard' ) with year +4 month +2 dayOfMonth |
The valueToParse is the value of variableName.
The valueToParse is a string that summarizes the language processor. This value contains five words separated by spaces.
The 1st word describes the language and implementation. It begins with the four letters REXX. The remainder of the 1st word describes the implementation. The remainder is not permitted to contain any periods.
The 2nd word describes the language level -- for example: 4.00
Three additional words describe the implementation's build date. This value is identical in format to the default value of the date built-in function -- for example: 17 Jul 2002
No additional words are permitted.
The template specifies how to partition the valueToParse into values which are assigned to variables. The template can be omitted. When the template is absent, the source string to parse is still prepared. This preparation may remove a line from the external data queue (PULL), perform a file input operation (LINEIN), or invoke procedures and functions (VALUE getData() WITH).
The parsing template has the following general form:
variableToAssign1 division_specifier variableToAssign2 division_specifier ...etc. |
Numerous variations of this general form are possible. Multiple division_specifiers can appear consecutively. The template can begin, or end, with division_specifiers.
The first character of each parsing template element is sufficient to distinguish whether it is a variable name to assign, or a division specifier. An element is a variable to assign when the first character is an eligible symbol name character; which is either a letter or one of the characters: ! ? and _.
The following kinds of division_specifier are available
Space delimiters are the spaces between a series of variables to assign in the template. Spaces that are adjacent to another division specifier are ignored.
Template example # 1:
parse value 'Sam likes peaches and cream' with , /* continue... */ subject verb object |
Note: the comma that follows the with keyword on the first line of the parse instruction above is a line continuation character. The associated template is on the second line.
The above template consists of three variables to assign, separated by spaces. The two spaces between the three words are template space delimiters. The first word in the value being parsed is assigned to the subject variable, the second to the verb variable, and remaining words are assigned to the object variable. Any extraneous spaces around, and between, the remaining words are also included in the value assigned to the object variable. Leading spaces are not present in the subject value. Intervening spaces between the subject value and the verb value are also discarded.
A period placeholder delimiter is a period that appears in the pattern. It acts as though a variable were specified at that position, however a variable is not assigned. You can see the value that would be assigned at that position by activating trace intermediates. The value that would be assigned to a placeholder is preceded by a ">.>" prefix.
Template example # 2:
parse value 'Sam likes peaches and cream' with , /* continue... */ subject . object |
Suppose you would like to extract the word 'peaches' from the above example. This is done with the period placeholder as follows:
Template example # 3:
parse value 'Sam likes peaches and cream' with , /* continue... */ . . objectWord1 . |
A literal pattern delimiter is a character string enclosed in single or double quotes.
Template example # 4:
parse value 'I think, therefore I am (I think)' with , /* continue... */ precondition ',' consequence '(' qualifier ')' |
Note, the consequence variable value has an invisible leading and trailing space.
The value being parsed is compared, from the current value position, with the literal pattern until an exact match is found. If the pattern is found, the prior variable is assigned all characters, including spaces, from the current value position to the position where the literal pattern occurred. The current value position would be advanced to the end of the value that matched the literal pattern. So the next character that would be assigned to a variable is the character immediately following the literal pattern in the source value. A subsequent relative column delimiter will advance that many characters forward, or backward, from the exact position where the literal pattern matched the source value.
Template example # 5:
parse value '16:51:42' with , /* this could have been returned by the time() function */ hour ':' minute ':' second |
Variable patterns are similar to literal patterns. The difference is that the pattern to match is the value of a parenthesized variable name.
The behavior of a variable pattern delimiter is identical in every way to a literal pattern delimiter.
Template example # 6:
delim = ':' parse value '16:51:42' with , /* this could have been returned by the time() function */ hour (delim) minute (delim) second |
You can also use variables that were assigned earlier in the template, as variable pattern delimiters.
Template example # 7:
parse value '02/07/18' with , /* this could have been returned by the date( 'Ordered' ) function */ hour +2 delim +1 minute (delim) second |
Notice that the variable delim was assigned the character at the 3rd position in the value string. This variable was referenced as a variable pattern subsequently. If another separating character were used in the value string the parsing would still function correctly. Consider the following alternative date values:
An absolute column offset is either a whole number that is greater than or equal to 0, or a similar number preceded by an equal sign, or a variable reference in parentheses that is preceded by an equal sign. Absolute column 1 is the first character in the source value. Similarly, absolute column 0 anticipates the first character in the source value. Column specifications that exceed the length of the source string are reduced to refer to the end of the string. Thus, 9999999 is the end of the source string.
A pending variable to assign receives all characters from the current position up to the absolute character position requested.
Template example # 8:
parse value '16:51:42' with , /* this could have been returned by the time() function */ hour 3 4 minute 6 7 second 9 |
Template example # 9:
parse value '16:51:42' with , /* this could have been returned by the time() function */ hour =3 =4 minute =6 =7 second =9 |
Template example # 10:
pos = 3 parse value '16:51' with , /* this could have been returned by the time() function */ hour =(pos) +1 minute |
Template example # 11:
parse value '16:51' with , /* this could have been returned by the time() function */ 99999 -2 minute |
Warning: a computed absolute offset of 0 or 1 can have unexpected consequences.
First, consider the following example.
str = 'abc.def' pos = pos( '.', str ) /* pos is 4 */ parse var str before =(pos) +1 after |
In the above,
before is set to abc
and,
after is set to def
Now consider.
str = '.abcdef' pos = pos( '.', str ) /* pos is 1 ! */ parse var str before =(pos) +1 after |
In this case,
before is set to .abcdef !!!
and,
after is set to abcdef
The unusual outcome is due to an anomaly -- absolute column 1 references the first character position in the source value.
A relative column delimiter is a number preceded by a plus or minus sign, or a variable reference in parentheses where the left parenthesis is preceded by a plus or minus sign.
When a positive relative column is specified, a pending variable to assign receives all characters from the current position forward for the relative number of character positions requested.
The current position is moved forward or backward the number of positions after assigning a variable.
Template example # 12:
parse value '16:51' with , /* this could have been returned by the time() function */ hour +2 minute +2 |
Template example # 13:
width = 2 parse value '16:51' with , /* this could have been returned by the time() function */ hour +(width) minute |
Within the context of a parse arg operation, multiple argument strings can be processed by the template. The template consists of multiple sub-templates separated by commas.
parse arg sub-template1 , sub-template2 , sub-template3 , ...etc. |
Parsing begins with the first argument string, which is processed by the first sub-template. When a comma is encountered within the template parsing proceeds with the next argument string, if one is available. If an argument is omitted, or unavailable, an empty string is parsed instead. This is processed by the second sub-template. Subsequent argument strings are processed similarly until the end of the entire parse instruction template is reached.
The following is a simple example that shows how three procedure arguments can be obtained. Each template extracts two characters from each argument string.
Template example # 14:
/* main program */ call SUB 'abra', 'ca', 'dabra' say result /* shows: ab ca da */ return 0 sub : procedure parse arg magic1 +2 , magic2 +2 , magic3 +2 return magic1 magic2 magic3 |