This chapter deals with the parsing function of REXX. You probably know its existence, but parsing is such a powerful feature of REXX, and has so many possibilities, that it's worth spending some more time on the subject.
The parsing instructions are ARG, PARSE, and PULL.
The data to parse is a source string. Parsing splits up
the source data and assigns pieces of it to the
variables named in a template. A template is a
model that specifies how the source string should be split.
This leads to a general form:
PARSE [UPPER] source template
or in more detail:
Parse instruction format. |
---|
>>---PARSE-+-------+-+-ARG-----------------------+-+----------+----->< +-UPPER-+ +-EXTERNAL------------------+ +-template-+ +-LINEIN--------------------+ +-NUMERIC-------------------+ +-PULL----------------------+ +-SOURCE--------------------+ +-VAR--name-----------------+ +-VALUE-+------------+-WITH-+ ! +-expression-+ ! +-VERSION-------------------+ |
Where:
UPPER | is an optional keyword instructing REXX to translate the source string to uppercase before parsing it. |
ARG | the source consists of the parameters passed to the procedure or subroutine |
EXTERNAL | here REXX reads from terminal input buffer or keyboard |
LINEIN | identical to EXTERNAL. Preferred keyword to conform to REXX SAA Level 2. |
PULL | input source comes from the CMS stack buffer |
VAR varname | this is a very good way to analyze the contents of a variable |
VALUE expression WITH | the source is then the result of evaluating the expression |
SOURCE | can be used to know the name of the exec, the environment in which it executes and how it was called |
VERSION | gives access to the level of REXX that is running (see Appendix D. REXX Versions. for more details) |
NUMERIC | returns information about the setting of NUMERIC. Not available on OS/2. |
template | The templates are probably the difficult part of the parse instruction, but as we will see, they are part of one of the most powerful REXX features. The largest part of this document handles about the templates. |
The simplest form of a template consists of only a list of variable names. Here is an example:
variable1 variable2 variable3
This form of template parses the source string into blank-delimited words.
| The source does not at all influence the way REXX uses the template. That's why we mostly use a PARSE VALUE string WITH template construct in our examples as it is then 100% clear what the source is. |
| The ARG instruction is just a shorter form for PARSE UPPER ARG while the PULL instruction is the short form of PARSE UPPER PULL, so we don't see the need to discuss these short forms in more detail. |
ARG
Lets you analyze the parameters passed to your
routine. When we say, routine, we can mean
both the procedure itself, or a subroutine within the procedure.
An example to have a smooth takeoff:
/* APERITIF EXEC , for starters */ parse arg parameters /* get what user gave us */ Say 'Hello, you asked for' parameters /* inform user */ call subrout1 'Cookies' /* call subroutine */ exit Subrout1: parse arg parms /* get parameters passed to subroutine */ if parms=parameters then say 'Your' parameters 'will arrive.' else Say 'Sorry, not available anymore.' return
Execution may result in:
» aperitif cherry and martini Hello, you asked for cherry and martini Sorry, not available anymore. Ready; » aperitif Cookies Hello, you asked for Cookies Your Cookies will arrive. Ready;
PULL
In this case, REXX
tries to read from the CMS program stack (i.e. data stacked
by programs). If there is no data stacked, then
REXX looks if the terminal input buffer contains some
data you entered while the program was running.
If no data there either, then the terminal enters
the VM READ state.
LINEIN
Comparable with PULL, except that REXX doesn't
first look at the program stack, but immediately looks for data in
the terminal input buffer.
EXTERNAL
Identical to LINEIN. LINEIN is to be
preferred, as this is conform to the REXX SAA Level 2, and as such
is available on OS/2 too, while EXTERNAL isn't.
REXX conforms to SAA Level 2 since VM/ESA Version 1, Release 2.1.
VAR name
parses the variable specified as name.
VALUE expression WITH
parses the result of evaluating expression. Many
programmers confuse the VAR and VALUE options
with each other. While VAR parses a REXX variable (which
must thus conform the naming rules for a variable,
for example, it can not start
with a digit or period), VALUE lets you parse the result
of an expression which can be virtually anything.
The simplest
expression is a quoted string, while more complex expressions
typically contain function calls.
Remark also that the keyword WITH is mandatory here to separate the expression from the template.
SOURCE
Returns following tokens:
PARSE SOURCE is thus typically used to determine the fileid of the procedure (e.g. to sign the error-messages). The invoked-as information may, for example, be useful in XEDIT macros, where the macro can react differently depending upon the synonym used for its invocation. This can be achieved by defining appropriate synonyms. For example:
'COMMAND SET SYNONYM CMDA MYMACRO' 'COMMAND SET SYNONYM CMDB MYMACRO' 'COMMAND SET PREFIX SYNONYM CMDP MYMACRO'
Now, macro MYMACRO can analyze the invoked-as (thus synonym) by which it was called and execute the proper piece of code. If the user enters CMDA, then the specific routine can be executed. If he enters CMDP on the prefix area, then another routine may be executed. For another good example of it, have a look at the QUERY XEDIT macro on the goodies.
This topic is extracted from the REXX Reference Guide. The chapter in the book has been reworked recently and we find they did an excellent job.
As seen in the introduction, the simplest template is a list of variable names. More complicated templates contain patterns in addition to variable names.
String patterns | Match characters in the source string to tell where to split it. (See "Templates Containing String Patterns" for details.) |
Positional patterns | Indicate the character positions at which to split the source string. These will be covered later in the course. |
Parsing is essentially a two-step process.
Here is a parsing instruction:
parse value 'time and tide' with var1 var2 var3
The template in this instruction is: var1 var2 var3. The data to be parsed is between the keywords PARSE VALUE and the keyword WITH, namely the source string time and tide. Parsing divides the source string into blank-delimited words and assigns them to the variables named in the template as follows:
var1='time' var2='and' var3='tide'
In this example, the source string to parse is a literal string, time and tide. In the next example, the source string is a variable.
string='time and tide' parse value string with var1 var2 var3 /* same results */
The PARSE VALUE does not convert alphabetic characters in the source string to uppercase (lowercase a-z to uppercase A-Z). If you want to convert characters to uppercase, use PARSE UPPER VALUE.
The PARSE VAR instruction is similar to PARSE VALUE except that the source string to parse is always a variable. In PARSE VAR, the name of the variable containing the source string follows the keywords PARSE VAR. In the next example, the variable stars contains the source string. The template is star1 star2 star3.
stars='Sirius Polaris Rigil' parse var stars star1 star2 star3 /* star1='Sirius' */ /* star2='Polaris' */ /* star3='Rigil' */
All variables in a template receive new values. If there are more variables in the template than words in the source string, the leftover variables receive null (empty) values. This is true for all parsing, for parsing into words with simple templates and for parsing with templates containing patterns. Here is an example using parsing into words.
satellite='moon' parse var satellite Earth Mercury /* Earth='moon' */ /* Mercury='' */
If there are more words in the source string than variables in the template, the last variable in the template receives all leftover data.
Here is an example of this second case:
satellites='moon Io Europa Callisto...' parse var satellites Earth Jupiter /* Earth='moon' */ /* Jupiter='Io Europa Callisto...'*/
Parsing into words removes leading and trailing blanks from each word before it is assigned to a variable. The exception to this is the word or group of words assigned to the last variable. The last variable in a template receives leftover data, preserving extra leading and trailing blanks. Here is an example:
solar5='Mercury Venus Earth Mars Jupiter ' parse var solar5 var1 var2 var3 var4 /* var1 ='Mercury' */ /* var2 ='Venus' */ /* var3 ='Earth' */ /* var4 =' Mars Jupiter ' */
In the source string, Earth has two leading blanks. Parsing removes both of them (the word-separator blank and the extra blank) before assigning Earth to var3. Mars has three leading blanks. Parsing removes one word-separator blank and keeps the other two leading blanks. It also keeps all five blanks between Mars and Jupiter and both trailing blanks after Jupiter.
Parsing removes no blanks if the template contains only one variable. For example:
parse value ' Pluto ' with var1 /* var1=' Pluto ' */
A period in a template is a placeholder. It is used instead of a variable name, but it receives no data. It is useful:
The period in next example is a placeholder. Be sure to separate adjacent periods with spaces ; otherwise, an error results.
stars='Arcturus Betelgeuse Sirius Rigil' parse var stars . . brightest . /* brightest='Sirius' */
Compare with this:
stars='Arcturus Betelgeuse Sirius Rigil' parse var stars drop junk brightest rest /* brightest='Sirius' */
A placeholder saves the overhead of unneeded variables.
|
We already mentioned that the last variable in the template gets the remaining of the source, including blanks. So, for example, parse value 'A string ' with var1 var2 has the effect that var2 contains 'string ' (with four trailing blanks). If you want to avoid this situation, then just add a placeholder at the end of the template, as in the first example of this topic. |
A string pattern matches characters in the source string to indicate where to split it. A string pattern can be a:
Literal string pattern | One or more characters within quotation marks. |
Variable string pattern | A variable within parentheses. |
Here are two templates: a simple template and a template containing a literal string pattern:
var1 var2 /* simple template */ var1 ', ' var2 /* template with literal string pattern */
The literal string pattern is ', ' and this template then:
A template with a string pattern can omit some of the data in a source string when assigning data into variables. The next two examples contrast simple templates with templates containing literal string patterns.
name='Smith, John' parse var name lastname firstname /* Assigns: lastname='Smith,' */ /* firstname='John' */
Notice that the comma remains (the variable lastname contains Smith,). In the next example the template is lastname ', ' firstname This removes the comma.
name='Smith, John' parse var name lastname ', ' firstname /* Assigns: lastname='Smith' */ /* firstname='John' */
First, the language processor scans the source string for ', '. It splits the source string at that point. The variable lastname receives data starting with the first character of the source string and ending with the last character before the match. The variable firstname receives data starting with the first character after the match and ending with the end of string.
A template with a string pattern omits data in the source string that matches the pattern. We used the pattern ', ' (with a blank) instead of ',' (no blank) because, without the blank in the pattern, the variable firstname receives ' John' (including a blank). Alternatively, a placeholder could be added to the end of the template to remove the blank, as is demonstrated here:
name='Smith, John' parse var name lastname ',' firstname . /* Assigns: lastname='Smith' */ /* firstname='John' */
As firstname is no longer the last word in the template, blanks are removed.
If the source string does not contain a match for a string pattern, then the variables preceding the unmatched string pattern get all the data in question. Any variables after that pattern receive the null string. For example:
parse value 'Smith, John' with lastname ', ' firstname /* lastname='Smith, John' */ /* firstname='' */
as the source does not contain two spaces after the comma, ln gets all the data and firstname is a null-string. Yet another case:
parse value 'Van Beethoven, Ludwig' with lastname1 lastname2 ', ' firstname /* lastname1='Van' */ /* lastname2='Beethoven, Ludwig' */ /* firstname='' */
A null string is never found. It always matches the end of the source string.
You can use a variable to specify the string pattern in a template. To do this, place the name of the variable in parentheses.
In the next example, both parse instruction have the same result:
parse var name firstname init '. ' lastname strngptrn='. ' parse var name firstname init (strngptrn) lastname
A positional pattern is a number that identifies the character position at which to split data in the source string. The number must be a whole number.
An absolute positional pattern is a number with no + or - preceding it. The number specifies the absolute character position at which to split the source string.
Here is a template with absolute positional patterns:
variable1 11 variable2 21 variable3
The numbers 11 and 21 are absolute positional patterns. The number 11 refers to the 11th position in the input string, 21 to the 21st position. This template
Positional patterns are probably most useful for working with columnar data, such as:
character positions: 1 11 21 40 +----------+----------+--------------------+end of FIELDS: !LASTNAME !FIRST !PSEUDONYM !record +----------+----------+--------------------+
The following example uses this record structure.
record.1='Clemens Samuel Mark Twain ' record.2='Evans Mary Ann George Eliot ' record.3='Munro H.H. Saki ' do n=1 to 3 parse var record.n lastname 11 firstname 21 pseudonym If lastname='Evans' & firstname='Mary Ann' then say 'By George!' end /* Says 'By George!' after record 2 */
The source string is first split at character position 11 and at position 21. The language processor assigns characters 1 to 10 into lastname, characters 11 to 20 into firstname, and characters 21 to 40 into pseudonym.
The template could have been:
1 lastname 11 firstname 21 pseudonym
instead of
lastname 11 firstname 21 pseudonym
Specifying the 1 is optional, as it is the default.
A relative positional pattern is a number with a plus (+) or minus (-) sign preceding it.
The number specifies the relative character position at which to split the source string. The plus or minus indicates movement right or left, respectively, from the start of the string (for the first pattern) or the position of the last match. The position of the last match is the first character of the last match. Here is the same example as for absolute positional patterns done with relative positional patterns:
record.1='Clemens Samuel Mark Twain ' record.2='Evans Mary Ann George Eliot ' record.3='Munro H.H. Saki ' do n=1 to 3 parse var record.n lastname +10 firstname + 10 pseudonym If lastname='Evans' & firstname='Mary Ann' then say 'By George!' end /* same results */
Blanks between the sign and the number are insignificant. Therefore, +10 and + 10 have the same meaning. Note that +0 is a valid relative positional pattern.
Absolute and relative positional patterns are interchangeable (except in the special case when string and positional patterns are combined, as we will see later) when a string pattern precedes a variable name and a positional pattern follows the variable name. The templates from the examples of absolute and relative positional patterns give the same results.
! ! !lastname 11! !firstname 21 ! ! pseudonym ! ! ! !lastname +10! !firstname + 10! ! pseudonym ! +--+---+ +------+-----+ +------+-------+ +-----+-----+ ! ! ! ! (Implied Put characters Put characters Put characters starting 1 through 10 11 through 20 21 through point is in lastname. in firstname. end of string position (Non-inclusive (Non-inclusive in pseudonym. 1) stopping point stopping point is 11 (1+10)) is 21 (11+10))
Only with positional patterns can a matching operation back up to an earlier position in the source string. Here is an example using absolute positional patterns :
string='astronomers' parse var string 2 var1 4 1 var2 2 4 var3 5 11 var4 say string 'study' var1!!var2!!var3!!var4 /* Displays: "astronomers study stars" */
The absolute positional pattern 1 backs up to the first character in the source string.
With relative positional patterns, a number preceded by a minus sign backs up to an earlier position. Here is the same example using relative positional patterns:
string='astronomers' parse var string 2 var1 +2 -3 var2 +1 +2 var3 +1 +6 var4 say string 'study' var1!!var2!!var3!!var4 /* same results */
In this example, the relative positional pattern -3 backs up to the first character in the source string.
The templates in the last two examples are equivalent.
! 2 ! !var1 4 ! ! 1 ! !var2 2! ! 4 var3 5! !11 var4 ! ! 2 ! !var1 +2 ! ! -3 ! !var2 +1! !+2 var3 +1! !+6 var4 ! +--+--+ +---+----+ +--+---+ +---+---+ +----+-----+ +---+----+ ! ! ! ! ! ! Start Non- Go to 1. Non- Go to 4 Go to 11 at 2. inclusive (4-3=1) inclusive (2+2=4). (5+6=11) stopping stopping Non-inclusive point is 4 point is stopping point (2+2=4) 2 (1+1=2) is 5 (4+1=5)
You can use templates with positional patterns to make multiple assignments (we'll come back on this later):
books='Silas Marner, Felix Holt, Daniel Deronda, Middlemarch' parse var books 1 Eliot 1 Evans /* Assigns the (entire) value of books to Eliot and to Evans. */
What happens when a template contains patterns that divide the source string into sections containing multiple words ? String and positional patterns divide the source string into substrings. The language processor then applies a section of the template to each substring, following the rules for parsing into words.
name=' John Q. Public' parse var name fn init '.' ln /* Assigns: fn='John' */ /* init=' Q' */ /* ln=' Public' */
The pattern divides the template into two sections, fn init and ln. The matching pattern splits the source string into two substrings, ' John Q' and ' public'
The language processor parses these substrings into words based on the appropriate template section.
John had three leading blanks. All are removed because parsing into words removes leading and trailing blanks except from the last variable.
Q has six leading blanks. Parsing removes one word-separator blank and keeps the rest because init is the last variable in that section of the template.
For the substring ' Public', parsing assigns the entire string into ln without removing any blanks. This is because ln is the only variable in this section of the template.
string='R E X X' parse var string var1 var2 4 var3 6 var4 /* Assigns: var1='R' */ /* var2='E' */ /* var3=' X' */ /* var4=' X' */
The positional pattern divides the template into three sections:
The matching patterns split the source string into three substrings that are individually parsed into words:
The variable var1 receives 'R' ; var2 receives 'E'. Both var3 and var4 receive ' X' (with a blank before the X) because each is the only variable in its section of the template.
You may want to specify a pattern by using the value of a variable instead of a fixed string or number. You do this by placing the name of the variable in parentheses. This is a variable reference. Blanks are not necessary inside or outside the parentheses, but you can add them if you wish.
The template in the next parsing instruction contains the following literal string pattern '. '.
parse var name fn init '. ' ln
Here is how to specify that pattern as a variable string pattern:
strngprtrn='. ' parse var name fn init (strngptrn) ln
If no equal, plus, or minus sign precedes the parenthesis, the value of the variable is then treated as a string pattern. The variable can be one that has been set earlier in the same template, such as in:
Say "Enter a date (dd/mm/yy format). ======> " /* assume 17/12/90 is given */ pull date parse var date mday 3 delim +1 month (delim) year
Here, the variable delim gets its value in the template itself. The result is that day receives value 17, month value 12 and year value 90.
If an equal, a plus, or a minus sign precedes the left parenthesis, then the value of the variable is treaded as an absolute or relative positional pattern(footnote 1). The value of the variable must be a nonnegative whole number.
The variable can be one that has been set earlier in the same template. In the following example, the first 2 fields specify the starting character positions of the last 2 fields:
dataline = '6 20 Samuel ClemensMark Twain' parse var dataline pos1 pos2 6 =(pos1) realname =(pos2) pseudonym /* Assigns: realname='Samuel Clemens' ; pseudonym='Mark Twain' */
Remember these rules: All parsing instructions assign parts of the source string into the variables named in the template. The following table summarizes where the source string comes from. It also indicates for which platforms and or VM/ESA releases this is available.
Instruction | Where the source string comes from | VM/ESA | OS/2 |
---|---|---|---|
PARSE ARG | Arguments you list when you invoke the program or arguments in the call to a subroutine or function. | all | yes |
ARG | Same as PARSE ARG, but arguments are translated to uppercase. | all | yes |
PARSE LINEIN | Next line from terminal input buffer | since R 2.0 | yes |
PARSE EXTERNAL | Identical to PARSE LINEIN, for compatibility with older VM releases | all | no |
PARSE NUMERIC | Numeric control information (from NUMERIC instruction) | all | no |
PARSE PULL | The string at the head of the external data queue. (If the queue is empty, uses default input, typically the terminal). | all | yes |
PULL | Similar to PARSE PULL, but string is translated to uppercase before parsing. | all | yes |
PARSE SOURCE | System-supplied string giving information about the executing program. | all | yes |
PARSE VALUE | Expression between the keyword VALUE and the keyword WITH in the instruction. | all | yes |
PARSE VAR name | parses contents of variable name | all | yes |
PARSE VERSION | System-supplied string telling the language, language level, and (three-word) date. | all | yes |
This section includes parsing multiple strings and flow charts depicting a conceptual view of parsing.
Only ARG and PARSE ARG can have more than one source string. To parse multiple strings, you can specify multiple comma-separated templates. Here is an example:
parse arg template1, template2, template3
This instruction consists of the keywords PARSE ARG and three comma-separated templates.1 (For an ARG instruction, the source strings to parse come from arguments you specify when you invoke a program or CALL a subroutine or function). Each comma is an instruction to the parser to move on to the next string.
Parsing multiple strings in a subroutine |
---|
num='3' musketeers="Porthos, Athos, Aramis, D'Artagnon" CALL Sub num,musketeers /* Passes num and musketeers to sub */ SAY total; say fourth /* Displays: "4" and " D'Artagnon" */ EXIT Sub: parse arg subtotal, . . . fourth total=subtotal+1 RETURN |
|
This example is a bit confusing for novice REXX programmers, so let's explain it in more detail. The CALL passes 2 parameters to the subroutine. The second parameter is a literal string, itself containing commas between the names of the musketeers. The parse instruction at the start of the subroutine indeed receives only 2 parameters, separated by a comma. The commas in the literal string are integral part of the second parameter and will not be considered as parameter separators by the parse instruction. An example where no commas are included in the parameters would have been less confusing, but would also have learned you less. |
Note that when a REXX program is started as a command, only one argument string is recognized. You can pass multiple argument strings for parsing:
If there are more templates than source strings, each variable in a leftover template receives a null string. If there are more source strings than templates, the language processor ignores leftover source strings. If a template is empty (two commas in a row) or contains no variable names, parsing proceeds to the next template and source string.
There is a special case in which absolute and relative positional patterns do not work identically. We have shown how string patterns skip over data in the source string. But a template containing the sequence:
does not skip over any data. A relative positional pattern moves relative to the first character of a string pattern. As a result, assignment includes the data that is in the string pattern. Thus, the variable receives characters including the matching data.
/* Template containing string pattern, then variable name, then */ /* relative positional pattern does not skip over any data. */ string='REstructured eXtended eXecutor' parse var string var1 3 junk 'X' var2 +1 junk 'X' var3 +1 junk say var1!!var2!!var3 /* Concatenates variables; displays: "REXX" */
Here is how this template works :
!var1 3! !junk 'X'! !var2 +1! !junk 'X'! !var3 +1 ! ! junk ! +---+---+ +---+----+ +---+---+ +----+----+ +---+----+ +--+---+ ! ! ! ! ! ! Put Starting Starting Starting Starting Starting characters at 3, put with first with char- with with char- 1 through characters 'X' put 1 acter after second 'X' after sec- 2 in var1. up to (not (+1) first 'X' put 1 (+1) ond 'X' (Stopping including) character put up to character put rest point is first 'X' in var2. second 'X' in var3. of string 3) in junk. in junk. in junk. var1='RE' junk= var2='X' junk= var3='X' junk= 'structured 'tended e' 'ecutor' e'
This ends the part as given in the REXX Reference Guide. Next chapter will put all these parsing rules in practice and show real-life examples.
(1)
This parsing technique is part of REXX SAA Level 2, and as such is
only available since VM/ESA Release 2.0. It is therefore
also available on OS/2 or Object REXX.
Back to text