In this chapter, we try to give you some more hints and tips for using the full power of the parse instruction.
Compare next three alternatives that produce exactly the same results:
/* 1. -------"standard" way of working----- */ ok=1 three=3 gap=0 /* allow no gaps */ xedit=0 /* suppose XEDIT not wanted */ test=1 /* suppose we run in testmode */ names='' /* we'll collect student names here */ empty='' /* 2. ---------- clever way -----------------*/ Parse value '1 3 0 1 0' with ok three gap test xedit names empty /* 3. ----- self-documenting clever way -----*/ Parse value '1 3 0 1 0', with ok three gap test xedit . , names empty
In all the cases, seven variables are initialized.
Notes:
The last alternative, besides the readability, has even more advantages over the first one.
if abc=1 then do ! if abc=1 then do if def=0 then do ! if def=0 then parse value 'YES NO 1', zorro='YES' ! with zorro king xedit . king ='NO' ! else parse value 'NO NO' , xedit=1 ! with zorro king . end ! end else do ! zorro='NO' ! king ='NO' ! end ! end !
In our first example, we have initialized seven variables. The variables names and empty are initialized to a null string.
The same can be achieved as follows:
Parse value '1 3 0 1 0' with ok three gap test xedit '' names empty
This time, a string pattern is used in the template. It is a null-string (''). REXX will look for the first appearance of this null-string in the source, and find it only after it has processed the whole source string. The variables names and empty will thus be initialized to null strings, as there is nothing left from the source.
The only advantage of coding it this way is that it makes it perfectly clear that the last 2 variables are set to null strings.
It often happens that you have to do multiple things with the same data. If you have noticed that the single parse that initializes many variables has a better performance (at least on VM/ESA), and that templates can contain positional (numeric) patterns, then following 'traditional' way of coding could be enhanced:
do queued() parse pull line select when left(line,1)='*' then iterate /* ignore comments */ when word(line,1)='KING' then call chess substr(line,10,20) when word(line,1) word(line,3)='FROM BOSS' then call myboss when word(line,1)='SKIP' then iterate otherwise Say 'Invalid card:' line end /* select */ end /* queued */
|
Why is performance impacted in above figure, especially when many records are to be processed ? |
Using a clever parsing template, we thus can improve both the readability and the performance:
do queued() parse pull line 1 col1 2 1 w1 . w3 . 1 10 part +20 select when col1 ='*' then iterate /* ignore comments */ when w1 ='KING' then call chess part when w1 w3='FROM BOSS' then call myboss when w1 ='SKIP' then iterate otherwise Say 'Invalid card:' line end /* select */ end /* queued */
It clearly demonstrates that you can parse a source as many times as you want if you use numeric patterns. The rule here is: when a number is equal to, or smaller than the current parse position, REXX parses the first part up to the backstepping number before going on. Applied to the example here, variable line will contain the complete record from the stack, col1 will take columns 1 to 2, while w1 gets the first word. Readability can be further improved by splitting the parse over separate lines, like here:
parse pull line , 1 col1 2 10 part +20 , 1 W1 . W3 .
Next is another case where parsing multiple times and backstepping can be useful. Following statement is typical:
parse upper arg args '(' options
It splits the arguments from the command options. The literal string '(' in the template is used to split off the options from the parameters. But, this left-parenthesis character will get lost from the data.
Sometimes, however, you may want to keep the literal string as well. This is possible by backstepping in the template.
If, for example, you code the statement like this,
parse upper arg args '(' options 1 argstring
then you have your initial parsing, but the same statement returns also the complete string in the variable argstring due to the backstepping to column 1. So, the left parenthesis is not lost and argstring can be re-used in other ways.
When you issue the command Q LIMITS(footnote 1) your may get:
+ q limits * Userid Storage Group 4K Block Limit 4K Blocks Committed Threshold DECEULAE 3 19000 12039-63% 90% + q limits for EREP Userid Storage Group 4K Block Limit 4K Blocks Committed Threshold EREP - - - -
In the first response, the number of committed blocks and the usage percentage are separated by the dash (-). But in the second example, if the user is not enrolled to SFS, the returned information changes and we get four dashes. So, parsing on the string constant '-' to find the number of committed blocks and the percentage is not safe here. If we change the parsing as follows,
parse var qlimits userid . '%' -10 blocks '-' percent '%' .
then, we can find the exact information (study the statement carefully).
Agree that parse is indeed a very powerful instruction. Can you imagine the flood of coding needed when using other functions such as POS() and SUBSTR() ?
Parsing into words is anyway better that to hard-code columns, as the responses of CP or CMS commands may vary from release to release and some words may become longer or shorter (e.g. the virtual addresses changed from 3 digits to 4 digits when migrating from VM/SP to VM/ESA).
If next piece of code still has secrets to you, then issue the HELP REXX DIAGRC command and/or run this sample with a TRACE ?I (after having supplied valid parameters for the LINK command of course).
/* Let's quietly LINK, but give Error Message if it would fail */ parse value diagrc(8,'CP LINK' uid addr cuu lmode) , with rc . 17 cpans '15'x if rc\=0 then call errexit rc,'Sorry, can''t work, LINK failed:' cpans
Remark that we call our general error exit routine.
Another technique that uses PARSE to give big performance gains is one where we manipulate all elements of a list. Look at next figure (seems simple but is not used enough):
do while list^='' parse var list item list .... handle "item" .. end /* or if you still need the "list" for later use */ tlist=list do while tlist^='' parse var tlist item tlist .... handle "item" .. end
If you have understood the example, then you will be able to understand the following too. It parses results from CP commands:
parse value diag(8,'Q RDR * ALL'), /* issue Q RDR * ALL */ with head '15'X buffer /* all except title line in "buffer" */ if buffer='' then /* no files found, let's show CP's msg */ call errexit 28,'Sorry:' head do while buffer^='' parse var buffer , /* strip one line of Buffer */ origin spid typ ......... '15'x , /* parse it */ buffer /* and put tail in buffer again */ ........ handle the file .............. end
The QUERY RDR * ALL returns either a message NO RDR FILES, or, a header line, followed by a list of reader files. As CP adds a newline character (x'15') between each record, the different reader files are manipulated after having stripped them of the response from CP.
Please take the time to fully understand this very useful technique.
|
The better performance is true for VM/ESA, but we discovered that on OS/2 the current implementation of the storage management gives not the same performance gain when using this parsing technique. We think the reason is the following: on VM/ESA, the storage for the string to be parsed is acquired only once, and when the parse instruction eats the first word from it, REXX only updates the internal pointer to point just after the first word. On OS/2 however, each iteration results in acquiring new storage for the (diminishing) string, and this seems to result in a higher overhead. |
You may already have tried to use the parse instruction to split up file records into fields, whereby the positions of the fields are variable.
Next piece of code would appear to be the solution:
/* Procedure GIMME - returns user specified columns out of a file */ address command parse upper arg col1 col2 col3 col4 . ... (verify if parameters are valid) 'EXECIO * DISKR INPUT FILE A (FINIS' do queued() parse pull (col1) str1 (col2) (col3) str2 (col4) say str1'-'str2 end
Coded like this, if col1 has a value 10, then parse will interprete (col1) as a variable string pattern and look for the string '10' instead of parsing at absolute column 10. In order to achieve what we want, we have to add an equal, plus or minus sign, like in this adapted procedure:
/* Procedure GIMME - returns user specified columns out of a file */ address command parse upper arg col1 col2 . ... (verify if parameters are valid) 'EXECIO * DISKR INPUT FILE A (FINIS' leng=col2 - col1 + 1 do queued() parse pull . =(col1) string +(leng) . say string end
If you prefer, you can now make the first coding exercise for this lesson (select Exercises) or continue reading the chapters 11 to 13.
(1)
QUERY LIMITS is the command by which you can know by how far
you used up the disk space allocated to you in the SFS filepool.
Back to text