Chapter 10. Practical use of Parsing.

In this chapter, we try to give you some more hints and tips for using the full power of the parse instruction.

Initializing many variables at once

Compare next three alternatives that produce exactly the same results:

 /* 1. -------"standard" way of working----- */
  ok=1
  three=3
  gap=0       /* allow no gaps */
  xedit=0     /* suppose XEDIT not wanted */
  test=1      /* suppose we run in testmode */
  names=''    /* we'll collect student names here */
  empty=''

 /* 2. ---------- clever way -----------------*/
  Parse value '1 3 0 1 0' with ok three gap test xedit names empty

 /* 3. ----- self-documenting clever way -----*/
  Parse value '1  3      0   1    0',
        with  ok  three  gap test xedit . ,
              names empty

In all the cases, seven variables are initialized. 

Notes:

  1. The advantage of the first method is that you can write comments.  As these can be very important, it is the only reason why we still use this method from time to time. 
  2. The second alternative has the advantage of occupying one single sourceline.  Another advantage is performance: REXX interpretes only one (but long) statement.  Storage for all of the 7 variables will be obtained in one operation instead of 7 different GETMAIN operations. 
  3. The third alternative adds improved readability: note the alignment used in last example, this makes it completely clear what values are assigned to the variables.  Note as well the dot after the last assigned variable.  It is used to exclude any leading (or trailing) blanks from the variable xedit

The last alternative, besides the readability, has even more advantages over the first one.

In our first example, we have initialized seven variables.  The variables names and empty are initialized to a null string.

The same can be achieved as follows:

  Parse value '1 3 0 1 0' with ok three gap test xedit '' names empty

This time, a string pattern is used in the template.  It is a null-string ('').  REXX will look for the first appearance of this null-string in the source, and find it only after it has processed the whole source string.  The variables names and empty will thus be initialized to null strings, as there is nothing left from the source.

The only advantage of coding it this way is that it makes it perfectly clear that the last 2 variables are set to null strings.


Parsing same source multiple times

It often happens that you have to do multiple things with the same data.  If you have noticed that the single parse that initializes many variables has a better performance (at least on VM/ESA), and that templates can contain positional (numeric) patterns, then following 'traditional' way of coding could be enhanced:

   do queued()
      parse pull line
      select
        when left(line,1)='*' then iterate          /* ignore comments */
        when word(line,1)='KING' then call chess substr(line,10,20)
        when word(line,1) word(line,3)='FROM BOSS' then call myboss
        when word(line,1)='SKIP' then iterate
        otherwise Say 'Invalid card:' line
      end /* select */
   end /* queued */

Question 19

Why is performance impacted in above figure, especially when many records are to be processed ?

Using a clever parsing template, we thus can improve both the readability and the performance:

   do queued()
      parse pull line  1 col1 2   1  w1 . w3 .  1 10 part +20
      select
        when col1 ='*' then iterate                 /* ignore comments */
        when w1   ='KING' then call chess part
        when w1 w3='FROM BOSS' then call myboss
        when w1   ='SKIP' then iterate
        otherwise Say 'Invalid card:' line
      end /* select */
   end /* queued */

It clearly demonstrates that you can parse a source as many times as you want if you use numeric patterns.  The rule here is: when a number is equal to, or smaller than the current parse position, REXX parses the first part up to the backstepping number before going on.  Applied to the example here, variable line will contain the complete record from the stack, col1 will take columns 1 to 2, while w1 gets the first word.  Readability can be further improved by splitting the parse over separate lines, like here:

      parse pull line  ,
               1 col1 2    10 part +20 ,
               1 W1 . W3 .

Next is another case where parsing multiple times and backstepping can be useful.  Following statement is typical:

   parse upper arg args '(' options

It splits the arguments from the command options.  The literal string '(' in the template is used to split off the options from the parameters.  But, this left-parenthesis character will get lost from the data.

Sometimes, however, you may want to keep the literal string as well.  This is possible by backstepping in the template. 

If, for example, you code the statement like this,

   parse upper arg args '(' options 1 argstring

then you have your initial parsing, but the same statement returns also the complete string in the variable argstring due to the backstepping to column 1.  So, the left parenthesis is not lost and argstring can be re-used in other ways.

When you issue the command Q LIMITS(footnote 1) your may get:

 + q limits *
   Userid    Storage Group  4K Block Limit  4K Blocks Committed Threshold
   DECEULAE          3               19000         12039-63%       90%
 + q limits for EREP
   Userid    Storage Group  4K Block Limit  4K Blocks Committed Threshold
   EREP              -                   -              -            -

In the first response, the number of committed blocks and the usage percentage are separated by the dash (-).  But in the second example, if the user is not enrolled to SFS, the returned information changes and we get four dashes.  So, parsing on the string constant '-' to find the number of committed blocks and the percentage is not safe here.  If we change the parsing as follows,

   parse var qlimits userid . '%' -10 blocks '-' percent '%' .

then, we can find the exact information (study the statement carefully). 

Agree that parse is indeed a very powerful instruction.  Can you imagine the flood of coding needed when using other functions such as POS() and SUBSTR() ?

Parsing into words is anyway better that to hard-code columns, as the responses of CP or CMS commands may vary from release to release and some words may become longer or shorter (e.g. the virtual addresses changed from 3 digits to 4 digits when migrating from VM/SP to VM/ESA).

A secret ?

If next piece of code still has secrets to you, then issue the HELP REXX DIAGRC command and/or run this sample with a TRACE ?I (after having supplied valid parameters for the LINK command of course).

    /* Let's quietly LINK, but give Error Message if it would fail */
    parse value diagrc(8,'CP LINK' uid addr cuu lmode) ,
          with rc . 17 cpans '15'x
    if rc\=0 then call errexit rc,'Sorry, can''t work, LINK failed:' cpans

Remark that we call our general error exit routine.

Handling a list of words by eating the list

Another technique that uses PARSE to give big performance gains is one where we manipulate all elements of a list.  Look at next figure (seems simple but is not used enough):

  do while list^=''
     parse var list item list
     .... handle "item" ..
  end
  /* or if you still need the "list" for later use */
  tlist=list
  do while tlist^=''
     parse var tlist item tlist
     .... handle "item" ..
  end

If you have understood the example, then you will be able to understand the following too.  It parses results from CP commands:

 parse value diag(8,'Q RDR * ALL'),                   /* issue Q RDR * ALL */
       with  head '15'X buffer        /* all except title line in "buffer" */
 if buffer='' then                  /* no files found, let's show CP's msg */
    call errexit 28,'Sorry:' head
 do while buffer^=''
    parse var buffer ,                         /* strip one line of Buffer */
              origin spid typ ......... '15'x ,                /* parse it */
              buffer                       /* and put tail in buffer again */
    ........ handle the file ..............
 end

The QUERY RDR * ALL returns either a message NO RDR FILES, or, a header line, followed by a list of reader files.  As CP adds a newline character (x'15') between each record, the different reader files are manipulated after having stripped them of the response from CP.

Please take the time to fully understand this very useful technique.

The better performance is true for VM/ESA, but we discovered that on OS/2 the current implementation of the storage management gives not the same performance gain when using this parsing technique.

We think the reason is the following: on VM/ESA, the storage for the string to be parsed is acquired only once, and when the parse instruction eats the first word from it, REXX only updates the internal pointer to point just after the first word.

On OS/2 however, each iteration results in acquiring new storage for the (diminishing) string, and this seems to result in a higher overhead.

Parsing with variable patterns, another example.

You may already have tried to use the parse instruction to split up file records into fields, whereby the positions of the fields are variable.

Next piece of code would appear to be the solution:

  /* Procedure GIMME - returns user specified columns out of a file */
  address command
  parse upper arg col1 col2 col3 col4 .
  ... (verify if parameters are valid)
  'EXECIO * DISKR INPUT FILE A (FINIS'
  do queued()
     parse pull (col1) str1 (col2) (col3) str2 (col4)
     say str1'-'str2
  end

Coded like this, if col1 has a value 10, then parse will interprete (col1) as a variable string pattern and look for the string '10' instead of parsing at absolute column 10.  In order to achieve what we want, we have to add an equal, plus or minus sign, like in this adapted procedure:

  /* Procedure GIMME - returns user specified columns out of a file */
  address command
  parse upper arg col1 col2 .
  ... (verify if parameters are valid)
  'EXECIO * DISKR INPUT FILE A (FINIS'
  leng=col2 - col1 + 1
  do queued()
     parse pull . =(col1) string +(leng) .
     say string
  end

If you prefer, you can now make the first coding exercise for this lesson (select Exercises) or continue reading the chapters 11 to 13.


Footnotes:

(1) QUERY LIMITS is the command by which you can know by how far you used up the disk space allocated to you in the SFS filepool.
Back to text