Advanced REXX Course, Lesson 3

This chapter deals with the parsing function of REXX. You probably know its existence, but parsing is such a powerful feature of REXX, and has so many possibilities, that it's worth spending some more time on the subject.

Introduction

The data to parse is a source string. Parsing splits up the source data and assigns pieces of it to the variables named in a template. A template is a model that specifies how the source string should be split.
This leads to a general form: PARSE [UPPER] source template
or in more detail:

Parse instruction format.
>>---PARSE-+-------+-+-ARG-----------------------+-+----------+----->< +-UPPER-+ +-EXTERNAL------------------+ +-template-+ +-LINEIN--------------------+ +-NUMERIC-------------------+ +-PULL----------------------+ +-SOURCE--------------------+ +-VAR--name-----------------+ +-VALUE-+------------+-WITH-+ ! +-expression-+ ! +-VERSION-------------------+

Parse instruction format.


  >>---PARSE-+-------+-+-ARG-----------------------+-+----------+-----><
             +-UPPER-+ +-EXTERNAL------------------+ +-template-+
                       +-LINEIN--------------------+
                       +-NUMERIC-------------------+
                       +-PULL----------------------+
                       +-SOURCE--------------------+
                       +-VAR--name-----------------+
                       +-VALUE-+------------+-WITH-+
                       !       +-expression-+      !
                       +-VERSION-------------------+

Where:

UPPER	is an optional keyword instructing REXX to translate the source string to uppercase before parsing it.
ARG	the source consists of the parameters passed to the procedure or subroutine
EXTERNAL	here REXX reads from terminal input buffer or keyboard
LINEIN	identical to EXTERNAL. Preferred keyword to conform to REXX SAA Level 2.
PULL	input source comes from the CMS stack buffer
VAR varname	this is a very good way to analyze the contents of a variable
VALUE expression WITH	the source is then the result of evaluating the expression
SOURCE	can be used to know the name of the exec, the environment in which it executes and how it was called
VERSION	gives access to the level of REXX that is running (see Appendix D. REXX Versions. for more details)
NUMERIC	returns information about the setting of `NUMERIC`. Not available on OS/2.
template	The templates are probably the difficult part of the parse instruction, but as we will see, they are part of one of the most powerful REXX features. The largest part of this document handles about the templates.

The simplest form of a template consists of only a list of variable names. Here is an example:

      variable1 variable2 variable3

The source does not at all influence the way REXX uses the template. That's why we mostly use a PARSE VALUE string WITH template construct in our examples as it is then 100% clear what the source is.

The ARG instruction is just a shorter form for PARSE UPPER ARG while the PULL instruction is the short form of PARSE UPPER PULL, so we don't see the need to discuss these short forms in more detail.

Sources analyzed in more detail.

ARG
Lets you analyze the parameters passed to your routine. When we say, routine, we can mean both the procedure itself, or a subroutine within the procedure.

 /* APERITIF EXEC , for starters */
 parse arg parameters                   /* get what user gave us */
 Say 'Hello, you asked for' parameters            /* inform user */
 call subrout1 'Cookies'                      /* call subroutine */
 exit
Subrout1:
 parse arg parms          /* get parameters passed to subroutine */
 if parms=parameters then say 'Your' parameters 'will arrive.'
                     else Say 'Sorry, not available anymore.'
 return

 » aperitif cherry and martini
   Hello, you asked for cherry and martini
   Sorry, not available anymore.
   Ready;
 » aperitif Cookies
   Hello, you asked for Cookies
   Your Cookies will arrive.
   Ready;

PULL
In this case, REXX tries to read from the CMS program stack (i.e. data stacked by programs). If there is no data stacked, then REXX looks if the terminal input buffer contains some data you entered while the program was running. If no data there either, then the terminal enters the VM READ state.

LINEIN
Comparable with PULL, except that REXX doesn't first look at the program stack, but immediately looks for data in the terminal input buffer.

EXTERNAL
Identical to LINEIN. LINEIN is to be preferred, as this is conform to the REXX SAA Level 2, and as such is available on OS/2 too, while EXTERNAL isn't. REXX conforms to SAA Level 2 since VM/ESA Version 1, Release 2.1.

VALUE expression WITH
parses the result of evaluating expression. Many programmers confuse the VAR and VALUE options with each other. While VAR parses a REXX variable (which must thus conform the naming rules for a variable, for example, it can not start with a digit or period), VALUE lets you parse the result of an expression which can be virtually anything. The simplest expression is a quoted string, while more complex expressions typically contain function calls.

Remark also that the keyword WITH is mandatory here to separate the expression from the template.

PARSE SOURCE is thus typically used to determine the fileid of the procedure (e.g. to sign the error-messages). The invoked-as information may, for example, be useful in XEDIT macros, where the macro can react differently depending upon the synonym used for its invocation. This can be achieved by defining appropriate synonyms. For example:

 'COMMAND SET SYNONYM CMDA MYMACRO'
 'COMMAND SET SYNONYM CMDB MYMACRO'
 'COMMAND SET PREFIX SYNONYM CMDP MYMACRO'

Now, macro MYMACRO can analyze the invoked-as (thus synonym) by which it was called and execute the proper piece of code. If the user enters CMDA, then the specific routine can be executed. If he enters CMDP on the prefix area, then another routine may be executed. For another good example of it, have a look at the QUERY XEDIT macro on the goodies.

Templates analyzed in more detail.

This topic is extracted from the REXX Reference Guide. The chapter in the book has been reworked recently and we find they did an excellent job.

As seen in the introduction, the simplest template is a list of variable names. More complicated templates contain patterns in addition to variable names.

String patterns	Match characters in the source string to tell where to split it. (See "Templates Containing String Patterns" for details.)
Positional patterns	Indicate the character positions at which to split the source string. These will be covered later in the course.

Simple Templates for Parsing into Words

      parse value 'time and tide' with var1 var2 var3

The template in this instruction is: var1 var2 var3. The data to be parsed is between the keywords PARSE VALUE and the keyword WITH, namely the source string time and tide. Parsing divides the source string into blank-delimited words and assigns them to the variables named in the template as follows:

      var1='time'
      var2='and'
      var3='tide'

In this example, the source string to parse is a literal string, time and tide. In the next example, the source string is a variable.

      string='time and tide'
      parse value string with var1 var2 var3           /* same results */

The PARSE VALUE does not convert alphabetic characters in the source string to uppercase (lowercase a-z to uppercase A-Z). If you want to convert characters to uppercase, use PARSE UPPER VALUE.

The PARSE VAR instruction is similar to PARSE VALUE except that the source string to parse is always a variable. In PARSE VAR, the name of the variable containing the source string follows the keywords PARSE VAR. In the next example, the variable stars contains the source string. The template is star1 star2 star3.

      stars='Sirius Polaris Rigil'
      parse var stars star1 star2 star3             /* star1='Sirius'  */
                                                    /* star2='Polaris' */
                                                    /* star3='Rigil'   */

All variables in a template receive new values. If there are more variables in the template than words in the source string, the leftover variables receive null (empty) values. This is true for all parsing, for parsing into words with simple templates and for parsing with templates containing patterns. Here is an example using parsing into words.

      satellite='moon'
      parse var satellite Earth Mercury               /* Earth='moon'  */
                                                      /* Mercury=''    */

If there are more words in the source string than variables in the template, the last variable in the template receives all leftover data.

      satellites='moon Io Europa Callisto...'
      parse var satellites Earth Jupiter              /* Earth='moon'  */
                                     /* Jupiter='Io Europa Callisto...'*/

Parsing into words removes leading and trailing blanks from each word before it is assigned to a variable. The exception to this is the word or group of words assigned to the last variable. The last variable in a template receives leftover data, preserving extra leading and trailing blanks. Here is an example:

      solar5='Mercury Venus  Earth   Mars     Jupiter  '
      parse var solar5 var1 var2 var3 var4
      /* var1  ='Mercury'                                              */
      /* var2  ='Venus'                                                */
      /* var3  ='Earth'                                                */
      /* var4  ='  Mars     Jupiter  '                                 */

In the source string, Earth has two leading blanks. Parsing removes both of them (the word-separator blank and the extra blank) before assigning Earth to var3. Mars has three leading blanks. Parsing removes one word-separator blank and keeps the other two leading blanks. It also keeps all five blanks between Mars and Jupiter and both trailing blanks after Jupiter.

Parsing removes no blanks if the template contains only one variable. For example:

      parse value '   Pluto   ' with var1       /* var1='   Pluto   ' */

The Period as a Placeholder

A period in a template is a placeholder. It is used instead of a variable name, but it receives no data. It is useful:

The period in next example is a placeholder. Be sure to separate adjacent periods with spaces ; otherwise, an error results.

      stars='Arcturus Betelgeuse Sirius Rigil'
      parse var stars . . brightest .            /* brightest='Sirius' */

      stars='Arcturus Betelgeuse Sirius Rigil'
      parse var stars drop junk brightest rest   /* brightest='Sirius' */

We already mentioned that the last variable in the template gets the remaining of the source, including blanks. So, for example,

      parse value 'A string    ' with var1 var2

has the effect that var2 contains 'string ' (with four trailing blanks).

If you want to avoid this situation, then just add a placeholder at the end of the template, as in the first example of this topic.

Templates Containing String Patterns

A string pattern matches characters in the source string to indicate where to split it. A string pattern can be a:

Literal string pattern	One or more characters within quotation marks.
Variable string pattern	A variable within parentheses.

Here are two templates: a simple template and a template containing a literal string pattern:

      var1 var2          /* simple template                            */
      var1 ', ' var2     /* template with literal string pattern       */

A template with a string pattern can omit some of the data in a source string when assigning data into variables. The next two examples contrast simple templates with templates containing literal string patterns.

      name='Smith, John'
      parse var name lastname firstname        /* Assigns: lastname='Smith,' */
                                               /*          firstname='John'  */

Notice that the comma remains (the variable lastname contains Smith,). In the next example the template is lastname ', ' firstname This removes the comma.

      name='Smith, John'
      parse var name lastname ', ' firstname   /* Assigns: lastname='Smith' */
                                               /*          firstname='John' */

First, the language processor scans the source string for ', '. It splits the source string at that point. The variable lastname receives data starting with the first character of the source string and ending with the last character before the match. The variable firstname receives data starting with the first character after the match and ending with the end of string.

A template with a string pattern omits data in the source string that matches the pattern. We used the pattern ', ' (with a blank) instead of ',' (no blank) because, without the blank in the pattern, the variable firstname receives ' John' (including a blank). Alternatively, a placeholder could be added to the end of the template to remove the blank, as is demonstrated here:

      name='Smith, John'
      parse var name lastname ',' firstname .  /* Assigns: lastname='Smith' */
                                               /*          firstname='John' */

If the source string does not contain a match for a string pattern, then the variables preceding the unmatched string pattern get all the data in question. Any variables after that pattern receive the null string. For example:

  parse value 'Smith, John' with lastname ',  ' firstname
                                                  /* lastname='Smith, John' */
                                                  /* firstname=''           */

as the source does not contain two spaces after the comma, ln gets all the data and firstname is a null-string. Yet another case:

  parse value 'Van Beethoven, Ludwig' with lastname1 lastname2 ',  ' firstname
                                        /* lastname1='Van'                  */
                                        /* lastname2='Beethoven, Ludwig'    */
                                        /* firstname=''                     */

Variable String Patterns

You can use a variable to specify the string pattern in a template. To do this, place the name of the variable in parentheses.

      parse var name firstname  init '. ' lastname
      strngptrn='. '
      parse var name firstname init (strngptrn) lastname

Templates Containing Positional (Numeric) Patterns

A positional pattern is a number that identifies the character position at which to split data in the source string. The number must be a whole number.

An absolute positional pattern is a number with no + or - preceding it. The number specifies the absolute character position at which to split the source string.

      variable1 11 variable2 21 variable3

The numbers 11 and 21 are absolute positional patterns. The number 11 refers to the 11th position in the input string, 21 to the 21st position. This template

Positional patterns are probably most useful for working with columnar data, such as:

              character positions:
             1          11         21                  40
             +----------+----------+--------------------+end of
     FIELDS: !LASTNAME  !FIRST     !PSEUDONYM           !record
             +----------+----------+--------------------+

      record.1='Clemens   Samuel    Mark Twain          '
      record.2='Evans     Mary Ann  George Eliot        '
      record.3='Munro     H.H.      Saki                '
      do n=1 to 3
         parse var record.n lastname 11 firstname 21 pseudonym
         If lastname='Evans' & firstname='Mary Ann' then say 'By George!'
      end                         /* Says 'By George!' after record 2  */

The source string is first split at character position 11 and at position 21. The language processor assigns characters 1 to 10 into lastname, characters 11 to 20 into firstname, and characters 21 to 40 into pseudonym.

      1 lastname 11 firstname 21 pseudonym

        lastname 11 firstname 21 pseudonym

A relative positional pattern is a number with a plus (+) or minus (-) sign preceding it.

The number specifies the relative character position at which to split the source string. The plus or minus indicates movement right or left, respectively, from the start of the string (for the first pattern) or the position of the last match. The position of the last match is the first character of the last match. Here is the same example as for absolute positional patterns done with relative positional patterns:

      record.1='Clemens   Samuel    Mark Twain          '
      record.2='Evans     Mary Ann  George Eliot        '
      record.3='Munro     H.H.      Saki                '
      do n=1 to 3
        parse var record.n lastname +10 firstname + 10 pseudonym
        If lastname='Evans' & firstname='Mary Ann' then say 'By George!'
      end                                             /* same results  */

Blanks between the sign and the number are insignificant. Therefore, +10 and + 10 have the same meaning. Note that +0 is a valid relative positional pattern.

Absolute and relative positional patterns are interchangeable (except in the special case when string and positional patterns are combined, as we will see later) when a string pattern precedes a variable name and a positional pattern follows the variable name. The templates from the examples of absolute and relative positional patterns give the same results.

    !      !   !lastname  11!   !firstname 21  ! ! pseudonym !
    !      !   !lastname +10!   !firstname + 10! ! pseudonym !
    +--+---+   +------+-----+   +------+-------+ +-----+-----+
       !              !                !               !
    (Implied   Put characters    Put characters   Put characters
    starting   1 through 10      11 through 20    21 through
    point is   in lastname.      in firstname.    end of string
    position   (Non-inclusive    (Non-inclusive   in pseudonym.
    1)         stopping point    stopping point
               is 11 (1+10))     is 21 (11+10))

Only with positional patterns can a matching operation back up to an earlier position in the source string. Here is an example using absolute positional patterns :

      string='astronomers'
      parse var string 2 var1 4 1 var2 2 4 var3 5 11 var4
      say string 'study' var1!!var2!!var3!!var4
      /* Displays: "astronomers study stars"                           */

The absolute positional pattern 1 backs up to the first character in the source string.

With relative positional patterns, a number preceded by a minus sign backs up to an earlier position. Here is the same example using relative positional patterns:

      string='astronomers'
      parse var string 2 var1 +2 -3 var2 +1 +2 var3 +1 +6 var4
      say string 'study' var1!!var2!!var3!!var4      /* same results   */

In this example, the relative positional pattern -3 backs up to the first character in the source string.

    !  2  !   !var1  4 !  !  1   ! !var2  2!  ! 4 var3  5!  !11 var4 !
    !  2  !   !var1 +2 !  ! -3   ! !var2 +1!  !+2 var3 +1!  !+6 var4 !
    +--+--+   +---+----+  +--+---+ +---+---+  +----+-----+  +---+----+
       !          !          !         !           !            !

    Start     Non-        Go to 1. Non-        Go to 4       Go to 11
    at 2.     inclusive   (4-3=1)  inclusive   (2+2=4).      (5+6=11)
              stopping             stopping    Non-inclusive
              point is 4           point is    stopping point
              (2+2=4)              2 (1+1=2)   is 5 (4+1=5)

You can use templates with positional patterns to make multiple assignments (we'll come back on this later):

      books='Silas Marner, Felix Holt, Daniel Deronda, Middlemarch'
      parse var books 1 Eliot 1 Evans
      /* Assigns the (entire) value of books to Eliot and to Evans.    */

Combining Patterns and Parsing Into Words

What happens when a template contains patterns that divide the source string into sections containing multiple words ? String and positional patterns divide the source string into substrings. The language processor then applies a section of the template to each substring, following the rules for parsing into words.

      name='    John      Q.   Public'
      parse var name fn init '.' ln        /* Assigns: fn='John'       */
                                           /*          init='     Q'   */
                                           /*          ln='   Public'  */

The pattern divides the template into two sections, fn init and ln. The matching pattern splits the source string into two substrings, ' John Q' and ' public'

The language processor parses these substrings into words based on the appropriate template section.

John had three leading blanks. All are removed because parsing into words removes leading and trailing blanks except from the last variable.

Q has six leading blanks. Parsing removes one word-separator blank and keeps the rest because init is the last variable in that section of the template.

For the substring ' Public', parsing assigns the entire string into ln without removing any blanks. This is because ln is the only variable in this section of the template.

      string='R E X X'
      parse var string var1 var2 4 var3 6 var4   /* Assigns: var1='R'  */
                                                 /*          var2='E'  */
                                                 /*          var3=' X' */
                                                 /*          var4=' X' */

The matching patterns split the source string into three substrings that are individually parsed into words:

The variable var1 receives 'R' ; var2 receives 'E'. Both var3 and var4 receive ' X' (with a blank before the X) because each is the only variable in its section of the template.

Parsing with Variable Patterns.

You may want to specify a pattern by using the value of a variable instead of a fixed string or number. You do this by placing the name of the variable in parentheses. This is a variable reference. Blanks are not necessary inside or outside the parentheses, but you can add them if you wish.

The template in the next parsing instruction contains the following literal string pattern '. '.

      parse var name fn  init '. ' ln

      strngprtrn='. '
      parse var name fn  init (strngptrn) ln

If no equal, plus, or minus sign precedes the parenthesis, the value of the variable is then treated as a string pattern. The variable can be one that has been set earlier in the same template, such as in:

    Say "Enter a date (dd/mm/yy format). ======> " /* assume 17/12/90 is given */
    pull date
    parse var date mday 3 delim +1 month (delim) year

Here, the variable delim gets its value in the template itself. The result is that day receives value 17, month value 12 and year value 90.

If an equal, a plus, or a minus sign precedes the left parenthesis, then the value of the variable is treaded as an absolute or relative positional pattern(footnote 1). The value of the variable must be a nonnegative whole number.

The variable can be one that has been set earlier in the same template. In the following example, the first 2 fields specify the starting character positions of the last 2 fields:

    dataline = '6 20 Samuel ClemensMark Twain'
    parse var dataline pos1 pos2 6 =(pos1) realname =(pos2) pseudonym
    /* Assigns: realname='Samuel Clemens' ; pseudonym='Mark Twain' */

Parsing Instructions Summary

Remember these rules: All parsing instructions assign parts of the source string into the variables named in the template. The following table summarizes where the source string comes from. It also indicates for which platforms and or VM/ESA releases this is available.

Instruction	Where the source string comes from	VM/ESA	OS/2
PARSE ARG	Arguments you list when you invoke the program or arguments in the call to a subroutine or function.	all	yes
ARG	Same as `PARSE ARG`, but arguments are translated to uppercase.	all	yes
PARSE LINEIN	Next line from terminal input buffer	since R 2.0	yes
PARSE EXTERNAL	Identical to `PARSE LINEIN`, for compatibility with older VM releases	all	no
PARSE NUMERIC	Numeric control information (from `NUMERIC` instruction)	all	no
PARSE PULL	The string at the head of the external data queue. (If the queue is empty, uses default input, typically the terminal).	all	yes
PULL	Similar to `PARSE PULL`, but string is translated to uppercase before parsing.	all	yes
PARSE SOURCE	System-supplied string giving information about the executing program.	all	yes
PARSE VALUE	Expression between the keyword `VALUE` and the keyword `WITH` in the instruction.	all	yes
PARSE VAR name	parses contents of variable name	all	yes
PARSE VERSION	System-supplied string telling the language, language level, and (three-word) date.	all	yes

Advanced Topics in Parsing

This section includes parsing multiple strings and flow charts depicting a conceptual view of parsing.

Parsing Multiple Strings

Only ARG and PARSE ARG can have more than one source string. To parse multiple strings, you can specify multiple comma-separated templates. Here is an example:

      parse arg template1, template2, template3

This instruction consists of the keywords PARSE ARG and three comma-separated templates.1 (For an ARG instruction, the source strings to parse come from arguments you specify when you invoke a program or CALL a subroutine or function). Each comma is an instruction to the parser to move on to the next string.

Parsing multiple strings in a subroutine
num='3' musketeers="Porthos, Athos, Aramis, D'Artagnon" CALL Sub num,musketeers /* Passes num and musketeers to sub / SAY total; say fourth / Displays: "4" and " D'Artagnon" */ EXIT Sub: parse arg subtotal, . . . fourth total=subtotal+1 RETURN

Parsing multiple strings in a subroutine


   num='3'
   musketeers="Porthos, Athos, Aramis, D'Artagnon"
   CALL Sub num,musketeers         /* Passes num and musketeers to sub */
   SAY total; say fourth           /* Displays: "4" and " D'Artagnon"  */
   EXIT

   Sub:
     parse arg subtotal, . . . fourth
     total=subtotal+1
     RETURN

This example is a bit confusing for novice REXX programmers, so let's explain it in more detail. The CALL passes 2 parameters to the subroutine. The second parameter is a literal string, itself containing commas between the names of the musketeers. The parse instruction at the start of the subroutine indeed receives only 2 parameters, separated by a comma. The commas in the literal string are integral part of the second parameter and will not be considered as parameter separators by the parse instruction.

An example where no commas are included in the parameters would have been less confusing, but would also have learned you less.

Note that when a REXX program is started as a command, only one argument string is recognized. You can pass multiple argument strings for parsing:

If there are more templates than source strings, each variable in a leftover template receives a null string. If there are more source strings than templates, the language processor ignores leftover source strings. If a template is empty (two commas in a row) or contains no variable names, parsing proceeds to the next template and source string.

Combining String and Positional Patterns : A Special Case

There is a special case in which absolute and relative positional patterns do not work identically. We have shown how string patterns skip over data in the source string. But a template containing the sequence:

does not skip over any data. A relative positional pattern moves relative to the first character of a string pattern. As a result, assignment includes the data that is in the string pattern. Thus, the variable receives characters including the matching data.

      /* Template containing string pattern, then variable name, then  */
      /* relative positional pattern does not skip over any data.      */
      string='REstructured eXtended eXecutor'
      parse var string var1 3 junk 'X' var2 +1 junk 'X' var3 +1 junk
      say var1!!var2!!var3 /* Concatenates variables; displays: "REXX" */

  !var1  3!   !junk 'X'!   !var2 +1!   !junk  'X'!   !var3 +1 !  ! junk !
  +---+---+   +---+----+   +---+---+   +----+----+   +---+----+  +--+---+
      !           !            !            !            !          !
  Put         Starting     Starting     Starting     Starting    Starting
  characters  at 3, put    with first   with char-   with        with char-
  1 through   characters   'X' put 1    acter after  second 'X'  after sec-
  2 in var1.  up to (not   (+1)         first 'X'    put 1 (+1)  ond 'X'
  (Stopping   including)   character    put up to    character   put rest
  point is    first 'X'    in var2.     second 'X'   in var3.    of string
  3)          in junk.                  in junk.                 in junk.

  var1='RE'   junk=        var2='X'     junk=        var3='X'    junk=
              'structured               'tended e'              'ecutor'
               e'

What would you have to specify as template in next parse instruction

   parse var dirid ???????
   if colon=':' then do_something
                else do_something_else

so that you can test if there is a colon (:) character specified in directory identifications (dirid) such as:

    IECSYSU:FSCIPOE.TC.TCVM1
    IECSYSU:
    IECSYSU

This ends the part as given in the REXX Reference Guide. Next chapter will put all these parsing rules in practice and show real-life examples.

(1) This parsing technique is part of REXX SAA Level 2, and as such is only available since VM/ESA Release 2.0. It is therefore also available on OS/2 or Object REXX.
Back to text