This is a large document, but it is worth reading from A to Z. You may consider printing it for later reference too.
Write a procedure that lists the different filetypes found on an accessed disk or SFS directory, and also displays the frequency at which they are found. The output we expect is thus something like this:
There are 7 files with filetype EXEC on your B-disk There are 13 files with filetype XEDIT on your B-disk ...
The procedure should accept a filemode as parameter (so don't hard-code a filemode !). We also explicitly ask you not to use CMS Pipelines but to search for the best solution using REXX logic (this is a very good exercise to practice the possibilities of REXX).
Test your procedure with a disk containing only a few files so that you can easily verify your results via a FILELIST. Then, stress test it on a larger disk (e.g. the Y-disk).
Several students sorted the filetypes. Sometimes, this was to have a nicer output, but in other cases, the sort served the logic. Indeed, by sorting the filetypes, it was easier to test for the different filetypes. If you look for example at our solution, you'll see that the sorting is not needed to produce a correct result.
It is a pity that REXX doesn't provide a SORT function. Sorting can be done via external CMS functions, such as
CMS Pipelines is the easiest, and fast too. You may remember from now on, that if you have, for example, to sort a REXX array, you can use:
'PIPE STEM ARRAY.!SORT!STEM ARRAY.'
If, for any reason (e.g. portability) you want to sort using pure REXX functions, then you could use the SORT routine listed here (it uses the Quick Sort recursive algorithm). For a very small number of elements, it's fast, but once the number augments, it becomes rapidly very CPU consuming.
REXX Sort routine | |
---|---|
/***********************************************************************/ /* Subroutine SORT */ /* Function : Sort a REXX array called ARRAY. */ /***********************************************************************/ SORT: procedure expose array. stack.1=1 ; stack.2=array.0 ; top=2 do until top=0 last=stack.top ; top=top-1 ; first=stack.top ; top=top-1 ; i=first do until first>=last j=last ; k=trunc((first+last)/2) ; dividingline=array.k do until i>j do while array.i |
To use it, have the elements you want to sort in a REXX stem with name ARRAY. and then call sort. On return, the array will be sorted. Total CPU consumption is exponential, and may go from 0.02 seconds for 10 elements to 0.59 seconds for 650 elements and 10.2 seconds for 8950 elements. Compiling triples the performance.
One could also use XEDIT to sort. It is fast too, surely faster than pure REXX for larger lists. This is an example on how to use XEDIT' SORT:
/* This exec is a sample showing how one can use XEDIT to SORT */ address command /*1 ---Get data to sort in STACK --------------------------------------*/ 'LISTFILE * EXEC A (STACK' /* We want the filenames sorted */ /*2 ---Stack commands to make XEDIT place stacked data in file --------*/ push 'INPUT' /* tell XEDIT to place stack in file */ queue '' /* to stop XEDIT reading from stack */ push '-* DEL *' /* but make file empty first */ /*3 ---Stack command to SORT the data and STACK them afterwards -------*/ queue '-* DMSXMS D 1 8' /* Call XEDIT's SORT module on cols 1-8 */ queue 'STACK *' /* Ask XEDIT to stack the sorted data */ queue 'COMMAND QUIT' /* And XEDIT can QUIT the file */ /*4 ---Call XEDIT, it'll handle the stacked lines ---------------------*/ 'XEDIT DUMB FILE S (NOPROF NOMSG' /* use a dummy file */ /*5 ---Here we have our sorted data -----------------------------------*/ do queued() parse pull fid ; say fid end
This solution does no I/O's as the file is never written to disk !
On VM/CMS, file identifiers cannot contain blanks, but on personal systems, this is possible. When you parse the strings to find the filetypes (extensions to use the correct term), you can't work with word parsing, but need to use positional parsing.
When you test your program, you need sometimes to provide a specific test case. When we ran this course, we used a disk containing following files:
PROFILE EXEC T0 PROFILE XEDIT T1 T EXEC T1 +T EXEC T1 TEST Document T2 NEW DOCUMENT T2
Note the special character (+) in a filename, and a lowercase filetype.
At the end of this chapter, we provide our solutions and a benchmark that compares them with different other solutions. You should review this as it will once more indicate that it is possible to drastically augment the performance of your procedures by using the good programming techniques.
Different techniques can be used, such as:
Some students included extra logic in their procedures to make a distinction in the output messages when there is only one file with a certain filetype (singular) as opposed to the plural when there are more files with a filetype. That's nice. Their logic resembles to this :
if nft.i=1 then say 'There is 1 file with filetype' ft.i 'on disk' fm else say 'There are' nft.i 'files with filetype' ft.i 'on disk' fm
This can be shorted if you use a programming trick we learned from John Hartmann (the creator of CMS Pipelines), although, it's complex programming style, we must admit:
say 'There' word('are is',(nft.i=1)+1) left('files',4+(nft.i>1)), 'with filetype' ft.i 'on disk' fm
The array nft. contains the number of files with a specific filetype ft.
This technique proves to be useful in other circumstances. For example, when we want to calculate the number of days since Jan 1, 1900 and to take the leap years into account :
/* Calculate days since Jan 1, 1900 */ DAYS_SINCE_1900: Parse arg yy'/'mm'/'dd Days.01= 00; Days.02= 31; Days.03= 59; Days.04= 90 Days.05= 120; Days.06= 151; Days.07= 181; Days.08= 212 Days.09= 243; Days.10= 273; Days.11= 304; Days.12= 334 now_to_1900 = (yy-1)*365 + (yy-1)%4 + days.mm + dd + (yy//4=0)*(mm>2) return now_to_1900
The last operand in the calculation adds 1 to the number of days if we're having a leap year (modulo 4) and if we're past the month February.
The exercise was meant as a combination of File I/O and
parsing. We'd like to come back to the solution
we gave for the last exercise of lesson 2 (the MULTILIN procedure), and
analyze this with you in detail now:
Having collected all one-lined
records in an array rec., we issue following statements to
split the records and to build the output array :
nr = 0 /* number of single records */ do i=1 to rec.0 do while rec.i<>'' parse value nr+1 rec.i with nr . 0 out.0 out.nr ';' rec.i end end
Let's simulate an iteration :
nr+1 rec.i 0+1 statement1;statement2;statement3 1 statement1;statement2;statement3
nr ==> 1 . ==> statement1;statement2;statement3
nr has now the value 1, and then we backstep to the beginning of the source again, thus:
out.0 ==> 1 out.nr ==> out.1 ==> statement1 rec.i ==> rec.1 ==> statement2;statement3
out.0 gets also value 1. out.nr becomes variable out.1 as nr became 1 in previous steps. out.1 is matched to statement1. Similarly, as i is 1 for the first iteration, rec.1 gets the rest of the record after the first semicolon.
nr+1 rec.i 1+1 statement2;statement3 2 statement2;statement3
so that the template gives:
nr ==> 2 . ==> statemtent2;statement3 out.0 ==> 2 out.nr ==> out.2 ==> statement2 rec.i ==> rec.2 ==> statement3
Both nr and out.0 have now value 2. out.2 gets the next logical record and rec.1 is eaten further.
What do we achieve by this technique ? Well, this single parse statement allows us to:
First time we ran this course, one of our students used this technique in an even more elaborated technique when coding its solution. It was Ivo Toelen of Kredietbank (now KBC). We want to give him again the full praise he deserves, and to be honest, he taught your instructors to push the limits of the parse instruction even further. We therefore repeat here the details of his solution, so that you can learn from it too.
Ivo's code resumes to this (after having put the result of a LISTFILE in an array LST.) :
1 ! TB. = 0 /* Table of counters, indexed on filetype */ 2 ! TBN. = 0 /* Table of existing filetypes */ 3 ! do i = 1 to lst.0 4 ! parse var lst.i . ft . 5 ! parse value TB.ft + 1 with new . 1 TB.ft . 6 ! if new = 1 then /* add the new filetype to the TBN table */ 7 ! parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex . 8 ! end 9 ! trc=trace('r') 10 ! do i = 1 to TBN.0 11 ! say "There are" right(value('TB.'TBN.i),8), ! "files with type '"left(TBN.i,8)"' on filemode" fm 12 ! end
Suppose our disk contains just three files:
PROFILE EXEC TEST EXEC LASTING GLOBALV
then, a trace of the procedure shows (we eliminated some irrelevant lines for editorial reasons):
3 *-* do i = 1 to lst.0 4 *-* parse var lst.i . ft . >>> "EXEC" 5 *-* parse value TB.ft + 1 with new . 1 TB.ft . >C> "TB.EXEC" >V> "0" >L> "1" >O> "1" >>> "1" >.> "" >C> "TB.EXEC" >>> "1" 6 *-* if new = 1 then /* add the new filetype to the TBN table */ >O> "1" 7 *-* parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex . >V> "0" >L> "1" >O> "1" >V> "EXEC" >O> "1 EXEC" >>> "1" >.> "EXEC" >>> "1" >C> "TBN.1" >>> "EXEC" 8 *-* end 3 *-* do i = 1 to lst.0 4 *-* parse var lst.i . ft . >>> "GLOBALV" 5 *-* parse value TB.ft + 1 with new . 1 TB.ft . >C> "TB.GLOBALV" >V> "0" >L> "1" >O> "1" >>> "1" >C> "TB.GLOBALV" >>> "1" 6 *-* if new = 1 then /* add the new filetype to the TBN table */ >O> "1" 7 *-* parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex . >V> "1" >L> "1" >O> "2" >V> "GLOBALV" >O> "2 GLOBALV" >>> "2" >.> "GLOBALV" >>> "2" >C> "TBN.2" >>> "GLOBALV" 8 *-* end 3 *-* do i = 1 to lst.0 4 *-* parse var lst.i . ft . >>> "EXEC" 5 *-* parse value TB.ft + 1 with new . 1 TB.ft . >C> "TB.EXEC" >V> "1" >L> "1" >O> "2" >>> "2" >C> "TB.EXEC" >>> "2" 6 *-* if new = 1 >O> "0" 8 *-* end 3 *-* do i = 1 to lst.0 9 *-* trc=trace('r') 10 *-* do i = 1 to TBN.0 11 *-* say "There are" right(value('TB.'TBN.i),8) "files with typ e '"left(TBN.i,8)"' on filemode" fm There are 2 files with type 'EXEC ' on filemode T 12 *-* end ...
After having read the file-ids in a REXX stem and initialized two REXX stems TB. and TBN. to 0, the file-ids are processed one by one.
The first filetype is EXEC. Let's analyze statement 5 in detail.
>C> "TB.EXEC" | as ft='EXEC', TB.ft becomes TB.EXEC |
>V> "0" | as the array was initialized to default 0, TB.EXEC gets a value of 0. |
>L> "1" | Then we add literal value 1 |
>O> "1" | to become "1" as the source for the parse. |
>>> "1" | As a result, new gets the value 1. |
>.> "" | dot as placeholder to remove leading and trailing blanks (if any) |
>C> "TB.EXEC" | and after backstepping in the template, |
>>> "1" | element EXEC of array TB. is also set to value 1 |
Thus, new and TB.EXEC both have value 1. The test at line 6 is thus true, and we come to analyze line 7:
>V> "0" | TBN.0 has indeed value 0 due to initialization of the array. |
>O> "1" | so, adding 1 gives 1 as output |
>V> "EXEC" | and as the filetype is EXEC |
>O> "1 EXEC" | the source for the parse becomes "1 EXEC". |
>>> "1" | Thus, newindex gets the first word, (1) |
>.> "EXEC" | the placeholder gets the rest |
>>> "1" | TBN.0 gets also value 1 due to backstepping. |
>C> "TBN.1" | and, as newindex just became 1, TBN.1 is a new array element, and... |
>>> "EXEC" | it gets the second word of the source (EXEC). |
So, for each new filetype, a new element is added to the TBN. array, and the TBN.0 element increments too so that we know how many elements the array contains.
When the second file is analyzed (filetype GLOBALV) the first parse does exactly the same as above for the EXEC filetype (as it is the first time we encounter this filetype, a new TBN. element is created). The second parse then gives:
>V> "1" | TBN.0 had value 1 after first loop, thus... |
>O> "2" | the sum is now 2. |
>O> "2 GLOBALV" | Our source string becomes thus "2 GLOBALV" |
>>> "2" | newindex gets value 2, and |
>>> "GLOBALV" | TBN.2 gets value GLOBALV. |
We had two different filetypes till now, and so we have two elements in our TBN. array. The third file is a filetype EXEC again. The first parse sets both new and TB.EXEC to a value of 2 (which is indeed the number of EXEC filetypes we encountered). Consequently, the if condition at line 6 is false and no new element is added to the TBN. array.
The first loop ends, and we are ready to analyze the second one. It produces the output at the terminal. The only tricky thing here is the value() function. These are the intermediate interpretation steps in the first iteration:
value('TB.'TBN.i) becomes value('TB.'"EXEC") as i=1 and TBN.1="EXEC" becomes value(TB.EXEC) after concatation of the strings becomes 2 as TB.EXEC is 2
Note the quotes around TB. in the value() function. This is needed, as otherwise REXX would try to find the value of the variable TB.TBN.1.
As this is definitely a key exercise of the telecourse, we add comments to several solutions we got from our former students.
! /*would normally contain author info etc */ 1 ! Address COMMAND 2 ! 'MAKEBUF' 3 ! parse upper arg fm . 4 ! types = '' 5 ! if fm = '' then fm = 'A' 6 ! 'LISTFILE * *' fm '(STACK' 7 ! if rc ¬= 0 then exit rc 8 ! x=0 9 ! do queued() 10 ! parse pull . type . 11 ! if wordpos( type, types) = 0 then types=types type 12 ! x=x+1 13 ! ftype.x = type 14 ! end 15 ! 'DROPBUF' 16 ! do loop = 1 to words(types) 17 ! count = 0 18 ! parse var types thistype types 19 ! do z = 1 to x 20 ! if thistype = ftype.z then 21 ! count = count+1 22 ! end 23 ! say 'There are 'count' file(s) of type 'thistype' on your 'fm'-Disk' 24 ! end 25 ! return
This procedure produces a correct output, but will be one of the worst performers in our benchmark later on. This is certainly due to the heavy use of the REXX functions word() and wordpos() and the general logic. At each iteration in the second loop, the number of words in the string has to be re-calculated. Furthermore, a large array is built in order to remember the filetypes and to be able to count them later in a double-loop. Our solutions show better techniques not that different from this after all. It means that some further analysis can drastically improve the performance of a procedure.
And, if a general exit routine had been used, it would become easier to give feedback that is more complete (e.g. for line 7).
! /************************************************************/ ! /* The Filename is QTYPES ********/ ! /* Function : Count FILETYPES ********/ ! /************************************************************/ 1 ! parse upper arg fm . 2 ! if fm = '' then fm = 'A' 3 ! queue 'FT FILELIST' 4 ! queue 'FN TEMP' 5 ! queue 'SORT * 16 23' 6 ! queue 'FFILE' 7 ! address command 'EXEC FILELIST * * 'fm 8 ! address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST' 9 ! rcexecio = rc 10 ! parse var flist . fti . 11 ! do while rcexecio = 0 12 ! i = 1 13 ! address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST' 14 ! rcexecio = rc 15 ! parse var flist . fto . 16 ! do while fti = fto & rcexecio = 0 17 ! address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST' 18 ! rcexecio = rc 19 ! parse var flist . fti . 20 ! i = i + 1 21 ! end; 22 ! if i > 1 then say i' files with filetype 'fto' on 'fm'-DISK.' 23 ! else say '1 file with filetype 'fti' on 'fm'-DISK.' 24 ! if i = 1 then fti = fto; 25 ! end 26 ! address command 'ERASE TEMP FILELIST A'
This procedure produces incorrect results, but has a clean programming style.
These are our comments:
The only advantage of this method is that there is a sorted output which simplifies the subsequent logic.
There is yet another problem due to the FILELIST as opposed to the LISTFILE. The FILELIST lists also any SFS sub-directories. Due to this, we get for example an extra result like this:
1 file with filetype N on N-disk
This is because a directory has no filetype, and the filemode is taken instead. This is why the output is incorrect.
queue 'FFILE TEMP FILELIST'
or even better:
queue 'COMMAND FILE TEMP FILELIST'
FFILE is indeed the synonym for COMMAND FILE, while FILE is the synonym for COMMAND PFILE (protected file), just as QQUIT is a synonym for COMMAND QUIT and QUIT is synonym for COMMAND PQUIT (protected quit).
! /**********************************************************************/ ! /* QTYPES - Created by Rudi */ ! /* Function : count the different filetypes on an accessed disk */ ! /* Parameters : filemode */ ! /**********************************************************************/ 1 ! address command 2 ! parse source . . myname . 3 ! parse upper arg fm 4 ! if fm='' then fm='A' 5 ! address '' 'ESTATE * * 'fm 6 ! if RC¬=0 then call Error_Exit rc,' 'fm' is not accessed' 7 ! address CMS 'LISTFILE * * 'fm' (exec' 8 ! if RC¬=0 then call Error_Exit rc,' No A disk accessed in R/W' 9 ! set cmstype ht 10 ! queue '16 25' 11 ! address '' 'SORT CMS EXEC A CMS TEMP A' 12 ! address '' 'ERASE CMS EXEC A ' 13 ! set cmstype rt 14 ! 'EXECIO * DISKR CMS TEMP A (FINIS margins 17 25 STEM FT.' 15 ! x=0;ftc.x=0 16 ! do i=1 to ft.0 by 1 17 ! if ft.i=fto then do /* did we have already such FT ? */ 18 ! ftc.x=ftc.x+1 /* increment counter */ 19 ! end 20 ! else do /* New FT */ 21 ! x=x+1 /* prepare next values */ 22 ! ftc.x=0 /* idem */ 23 ! ftc.x=ftc.x+1 /* increment new counter */ 24 ! ft.x=ft.i /* set new FT */ 25 ! end 26 ! fto=ft.i 27 ! end 28 ! do y=1 to x by 1 29 ! say 'you have 'ftc.y' files with filetype 'ft.y' on the disk accessed', ! 'as 'fm 30 ! end 31 ! say 'Total : 'ft.0' files ' 32 ! say 'Sorry, no PARSE used, where could i have used it ????' 33 ! say 'What did i miss ???, could it perform better ??' 34 ! address '' 'ERASE CMS TEMP A' 35 ! call Error_Exit 00,'All done' ! /****************** Exit routine ********************************/ 36 ! Error_Exit: 37 ! parse arg retc,errmsg 38 ! if errmsg<>'' then say myname':' errmsg 39 ! exit retc
This solution produces correct output, is clean but has a rather complex programming style.
[address 'COMMAND'] 'CMDCALL LISTFILE ...'
But, the following statement suggests that you want to intercept the case where the A-disk is not accessed in R/W (rare, but can happen). By using ESTATEW, the same simple technique as with line 5 could be used (note the extra W to test for R/W access).
Anyway, this discussion wouldn't matter if the stack would have been used instead of a temporary file (STACK option). The temporary file allows sorting the output (line 11), but a second output file is needed for the sort.
cmstype=cmsflag('CMSTYPE') /* returns 0 if HT, else 1 */ if cmstype then 'SET CMSTYPE HT' /* do your commands */ if cmstype then 'SET CMSTYPE RT'
! /* ========================================================== */ ! /* QTYPES - created by: Fred Flinstone */ ! /* Function: list # of files with filetypes */ ! /* Parameters: filemode (default: A) */ ! /* Options: none */ ! /* ========================================================== */ 1 ! address command 2 ! parse upper arg fm . 3 ! if fm = '' then fm = 'A' 4 ! 'ERASE CMS EXEC A' 5 ! 'ERASE $$CMS$$ $$EXEC$$ A' 6 ! 'LISTFILE * * ' fm '(FN FT EXEC' 7 ! if rc¬=0 then do 8 ! say 'disk' fm 'not currently accessed' 9 ! exit 1 10 ! end 11 ! queue '17 24' 12 ! 'PIPE COMMAND SORT CMS EXEC A $$CMS$$ $$EXEC$$ A' 13 ! parse value linein($$CMS$$ $$EXEC$$ A) 1 with . . . ftk nfk . 14 ! do until lines($$CMS$$ $$EXEC$$ A) = 1 15 ! parse value linein($$CMS$$ $$EXEC$$ A) with . . . ft . 16 ! if ft = ftk then nfk = nfk + 1 17 ! else do 18 ! if nfk = 1 then 19 ! say 'there is 1 file with filetype' ftk 'on disk' fm 20 ! else 21 ! say 'there are' nfk 'files with filetype' ftk 'on disk' fm 22 ! parse value 1 ft , 23 ! with nfk ftk 24 ! end 25 ! end 26 ! exit
This procedure produces incorrect output and is a very bad performer.
We appreciate that at least one student tried to use the linein() function for the method. Lesson 4 will show that this is not the best performer, but at least we have an extra example now. It also partly explains why it is a bad performer (see benchmark at the end of the chapter). We have to blame the authors of LINEIN, not the student. The benchmark will also show why it is not good to work with a temporary file.
Other comments:
! /*******************************************************************/ ! /* QTYPES - Created by Valerie */ ! /*******************************************************************/ 1 ! address command 2 ! arg fmode . 3 ! 'MAKEBUF' 4 ! 'LISTFILE * *' fmode '(STACK FT' 5 ! if rc=36 then exit 6 ! x=queued() 7 ! if fmode='' then fmode='A' 8 ! parse value ' ' with f t 9 ! do i=1 to x 10 ! pull fname stem.i 11 ! f=f stem.i 12 ! end 13 ! do i=1 to x 14 ! parse value '0 1' with r j 15 ! if wordpos(stem.i,t)¬=0 then iterate 16 ! do until y=0 17 ! y=wordpos(stem.i,f,j) 18 ! if y¬=0 then do 19 ! r=r+1 20 ! j=y+1 21 ! end 22 ! if y=0 & r>1 then t=t stem.i 23 ! end 24 ! say 'There are' r 'files with filetype' stem.i 'on your' fmode'-disk' 25 ! end 26 ! 'DROPBUF' 27 ! exit
This solution produces incorrect output and is a bad performer, but shows a clean programming style.
Our comments:
else if r>1 then...
as this allows to follow the logic more easily.
! /*******************************************************************/ ! /* QTYPES - Created by Wim */ ! /* Function : Count filetypes */ ! /* Parameters : filemode */ ! /*******************************************************************/ 1 ! address command 2 ! parse upper source . . myname mytype . syn. 3 ! parse upper arg fm . 4 ! if fm = '' then fm = 'A' 5 ! 'DROPBUF' 6 ! 'LISTFILE * *' fm '(NOHEADER STACK' /* listfile of the filemode fm */ 7 ! if rc <> 0 then call errexit rc,'Problem with listfile * *' fm 8 ! ft_array = '' 9 ! aantal_ft. = 0 10 ! do queued() 11 ! parse pull . ft . /* get the filetype */ 12 ! ft_pos = wordpos(ft,ft_array) /* is it already in the list ? */ 13 ! if ft_pos = 0 then do /* not it isn't */ 14 ! ft_array = ft_array ft /* so add it to the list */ 15 ! ft_pos = words(ft_array) /* what is the position in the list ? */ 16 ! end ! /* increment element in array with 1 */ 17 ! aantal_ft.ft_pos = aantal_ft.ft_pos + 1 18 ! end 19 ! do i=1 to words(ft_array) /* print result */ 20 ! if aantal_ft.i = 1 then 21 ! say 'there is 1 file with filetype' word(ft_array,i) 'on your', 22 ! fm'-disk.' 23 ! else 24 ! say 'there are' aantal_ft.i 'files with filetype' word(ft_array,i), 25 ! 'on your' fm'-disk.' 26 ! end 27 ! exit ! 28 ! ERREXIT: /* exit with errormessage(s) */ 29 ! do n=2 to arg() /* get all errormessages to show */ 30 ! say myname ':' arg(n) /* give the n'th message */ 31 ! end n 32 ! exit arg(1)
This program produces correct output, has a clean programming style and has an acceptable performance.
Our comments:
if wordpos(ft,ft_array)=0 then do...
Let's now review our solutions :
/*******************************************************************/ /* Procedure QTYPES1 List filetype occurrences on any disk */ /*******************************************************************/ address command parse upper arg mode . /* get parameters */ if mode='' then mode='A' /* defaults to A */ oldqueue=queued() ; 'MAKEBUF' ; buffer=rc /* make stack buffer */ filetypes='' /* initialize string */ 'LISTFILE * * 'mode' (STACK' /* issue the command */ if rc<>0 then call errexit rc,'Error reading filelist of disk 'mode total=queued()-oldqueue /* total number of files */ do queued()-oldqueue parse pull . ft . /* get filetype from stack */ if wordpos(ft,filetypes)<>0 then nbr.ft=nbr.ft+1 /* increment */ else do /* if not yet found */ filetypes=filetypes ft /* add to string */ nbr.ft=1 /* initialize counter */ end end say 'From a total of 'total' files on disk' mode '...' l=length(total) do i=1 to words(filetypes) /* perform output */ ft=word(filetypes,i) say right(nbr.ft,l) 'have a type' ft end call exit 0 /* Subroutines ** you know our ERREXIT routine by now isn't it ? ***/
Our first solution uses the technique whereby we build a string that contains all different filetypes, but instead of using an array with numeric indices (1 to n) for counting the occurrences of filetypes, we build an array whereby the elements (the tails of the stem) are the filetypes themselves.
This means, we build an array where the elements could be nbr.MODULE, nbr.EXEC, nbr.XEDIT, nbr.SCRIPT, etc., and these will contain the number of times a filetype is present on the disk.
The main advantage of this technique is that, while counting, we don't have to look up the filetype in the filetypes string. When we analyze a filetype, its name is the index in our array immediately.
We have however still to build up the filetypes string in order to be able to run the second loop producing the output.
You should note that REXX accepts anything as index (tail) for a stem, even a string containing blanks or special characters ! In addition, this can give problems. If we had coded the first loop like this:
do queued()-oldqueue parse pull 10 ft +8 /* get filetype from stack */ if wordpos(ft,filetypes)<>0 then nbr.ft=nbr.ft+1 /* increment */ else do /* if not yet found */ filetypes=filetypes ft /* add to string */ nbr.ft=1 /* initialize counter */ end end
then, the second loop wouldn't find back the correct indices anymore, as the parse will create filetypes with a length of 8 characters (so, with trailing blanks). In that case we would need to add a ft=left(ft,8) in the second loop to make it a string of exactly 8 characters.
A minor change to the code can still improve performance.
Take a close look at this statement from the first loop:
if wordpos(ft,filetypes)¬=0 ...
For each filetype, REXX has to compare it to each word in the variable filetypes ; when this list grows, it can cost quite a lot of CPU when REXX has to scan it sequentially.
If you accept that REXX uses a much faster binary search algorithm when it has to find one of its variables, then it is possible to start looking for improvements.
nbr.=0 /* Initialize array */ do queued()-oldqueue parse pull . ft . if nbr.ft=0 then filetypes=filetypes ft /* add new file to list */ nbr.ft=nbr.ft + 1 /* increment counter */ end
Instead of looking up the large filetypes string, we initialize the array before the loop(footnote 2), and we simply test whether the element of the array is greater than zero or not.
For example, the very first time a filetype LISTING is encountered, the element nbr.LISTING will be 0. So, we then assign it a value 1. For next occurrences, the array element will not be 0 anymore, and we just add 1 to the counter but don't add the filetype to the list anymore.
We still need the filetypes string for our second loop.
But, this second loop is also negatively impacted by the use of the words() and word() functions. The words() function forces REXX to scan the whole string at each iteration, while for word(), it depends ; obviously, the first word is found immediately, but to find the thousandth word, REXX has again to scan for 999 blanks...(footnote 3)
So, let's replace it by the parsing the string while eating it technique that we learned in the lesson. We implemented this in our second solution, and, as you can see, the procedure did not loose readability due to our performance improvements. It gave a gain of 36% for the test run on the Y-disk.
Please, don't conclude from this that you shouldn't use the REXX functions at all. Remember, in order of priority:
Here comes our second solution:
/*******************************************************************/ /* Procedure QTYPES2 List filetype occurrences on any disk */ /*******************************************************************/ address command parse upper arg mode . /* get parameters */ if mode='' then mode='A' /* defaults to A */ oldqueue=queued() ; 'MAKEBUF' ; buffer=rc /* make stack buffer */ filetypes='' /* initialize string */ nbr.=0 /* ftype counter */ 'LISTFILE * * 'mode' (STACK' /* issue the command */ if rc<>0 then call errexit rc,'Error reading filelist of disk 'mode total=queued()-oldqueue /* total number of files */ do queued()-oldqueue parse pull . ft . /* get filetype from stack */ if nbr.ft=0 then filetypes=filetypes ft /* append new file */ nbr.ft=nbr.ft+1 /* increment counter */ end say 'From a total of 'total' files on disk' mode '...' l=length(total) do while filetypes^='' /* perform output */ parse var filetypes ft filetypes /* eat our list */ say right(nbr.ft,l) 'have a type' ft end call exit 0 /* Subroutines ** you know our ERREXIT routine by now isn't it ? ***/
Another improvement we applied to this classic REXX coding has to deal with storage management. With all operating systems, storage management is somehow costly. On VM systems, we know REXX does the following:
So, in our procedure, we have the variable filetypes that grows continuously. We can help REXX if we first pre-allocate a large variable that is filled in afterwards. The gains are measurable both for the interpreter as for the compiler, but of course depends on the number of times the variable grows in the loop.
Here is our third solution where storage pre-allocation is used:
/*******************************************************************/ /* Procedure QTYPES25 : Using storage pre-allocation. */ /*******************************************************************/ address command parse upper arg mode . /* get parameters */ if mode='' then mode='A' /* defaults to A */ oldqueue=queued() ; 'MAKEBUF' ; bufno=rc /* make stack buffer */ filetypes=left('',1000) /* initialize string */ filetypes='' /* initialize string */ nbr.=0 /* ftype counter */ 'LISTFILE * * 'mode' (STACK' /* issue the command */ if rc<>0 then call emsg rc,'Error reading filelist of disk 'mode total=queued()-oldqueue /* total number of files */ do queued()-oldqueue parse pull . ft . /* get filetype from stack */ if nbr.ft=0 then filetypes=filetypes ft /* append new file */ nbr.ft=nbr.ft+1 /* increment counter */ end say 'From a total of 'total' files on disk' mode '...' l=length(total) do while filetypes^='' /* perform output */ parse var filetypes ft filetypes /* eat our list */ say right(nbr.ft,l) 'have a type' ft end call exit 0
We couldn't resist searching for a solution using CMS Pipelines. This solution, allows us to produce a sorted output. To make a distinction between singular and plural is not so easy however, and therefore not included in this pipeline.
/*******************************************************************/ /* Procedure QTYPES3 Solution using PIPE */ /*******************************************************************/ address command parse upper arg mode . /* get parameters */ if mode='' then mode='A' /* defaults to A */ 'PIPE COMMAND LISTFILE * *' mode '(FT HEADER', '!VAR MSG', /* Header or Errmsg to REXX */ '!DROP 1', /* drop Header or Errmsg */ '!SPEC Word 2 1', /* take ftype only */ '!SORT 1.8', /* sort on Ftype */ '!UNIQUE COUNT LAST', /* now count Ftypes */ '!SPEC 1.10 2 /have a type/ Nextword 11-* Nextword', '!Literal On disk' mode '...', /* add headerline */ '!CONSOLE' if rc^=0 then Say msg /* show LISTFILE's errormsg */ exit rc
We could even go a bit further and sort the output a second time to list the filetypes by usage-frequency.
/*******************************************************************/ /* Procedure QTYPES35 : Using PIPE and sorting twice. */ /*******************************************************************/ address command parse upper arg mode . /* get parameters */ if mode='' then mode='A' /* defaults to A */ 'PIPE COMMAND LISTFILE * *' mode '(FT HEADER', '!VAR MSG', /* Header or Errmsg to REXX */ '!DROP 1', /* drop Header or Errmsg */ '!SPEC Word 2 1', /* take ftype only */ '!SORT 1.8', /* sort on Ftype */ '!UNIQUE COUNT LAST', /* now count Ftypes */ '!SORT 1.10 D', /* sort most occuring first */ '!SPEC 1.10 2 /have a type/ Nextword 11-* Nextword', '!Literal On disk' mode '...', /* add headerline */ '!CONSOLE' if rc^=0 then Say msg /* show LISTFILE's errormsg */ exit rc
Finally, we were interested in analyzing the performance of all the different solutions. We used the BENCH goodie as described in Appendix C. We ran the procedures four times for the goodies disk containing 685 files at that time.
Note: We had to modify slightly the procedures so that:
We added results from several other solutions, so that we have an even more representative set.
We confess that we sometimes don't compare the same things. Some students made for example a distinction between unique filetypes and multiple filetypes and adapted the output message to the singular or plural version. We think however that this counts only for a minimal overhead as it runs in the second loop with a limited number of iterations.
The first column gives the ranking. It is based on the Service Time. This is a calculated time being the sum of the CPU time and the I/O time. For the I/O time, we count with an average service time of 25 milliseconds.
The last column (before the comments) contains the true Elapsed Time. This time is influenced by the actual performance of the system but as we did the runs one after the other, the variation due to the system load (it was lightly loaded) is minimal. It may surprise you is that for some runs, the Elapsed Time is shorter than the Service Time. This clearly shows what minidisk caching (the I/O Service Time is not 20 ms) can bring to 'bad applications'. If there were no minidisk caching, then the elapsed time would be very near to the calculated service time.
Both for the CPU as for the Service Time, we included a percentage comparing the solutions to the worst performer (which has of course 100%).
Rank! R ! Tot CPU !SIO IO ! Servtime !Elapsed !Command Method ! U ! time % ! time ! time % ! time ! ! N !(Sec) !nbr (Secs) !(Secs) ! (Secs) ! ----+---+----------+-----------+----------+--------+--------------------------- 32 !29 ! 15.48 100! 0 0.00! 15.48 52! 16.77 !ANDY REXX Wordpos 31 !30 ! 5.33 34! 0 0.00! 5.33 18! 6.05 !VALERIE REXX Wordpos 30 ! 5 ! 4.79 31! 991 24.78! 29.57 100! 8.05 !FRED Workfile & Sort 29 !25 ! 2.75 18! 92 2.30! 5.05 17! 4.11 !X94204 Workfile & XEDIT so 28 ! 1 ! 2.15 14! 90 2.25! 4.40 15! 4.23 !PHILIPPE Filelist 27 ! 2 ! 1.76 11! 94 2.35! 4.11 14! 4.75 !X94106 Filelist 26 ! 6 ! 1.53 10! 49 1.23! 2.76 9! 4.27 !X94110 Workfile 25 !20 ! 1.44 9!1042 26.05! 27.49 93! 2.71 !X94101 Workfile & Sort 24 ! 3 ! 1.44 9! 82 2.05! 3.49 12! 4.86 !X94113 Filelist 23 ! 4 ! 1.32 9! 998 24.95! 26.27 89! 8.24 !RUDI Workfile & Sort 22 ! 7 ! 1.32 9! 0 0.00! 1.32 4! 2.49 !X94114 REXX Wordpos 21 !21 ! 1.28 8! 64 1.60! 2.88 10! 6.53 !X94210 Filelist 20 !26 ! 1.27 8!1010 25.25! 26.52 90! 2.41 !X94209 Workfile & Sort 19 ! 8 ! 1.26 8! 0 0.00! 1.26 4! 1.94 !X94108 REXX Wordpos 18 !O1 ! 1.20 8! 0 0.00! 1.20 4! 2.10 !QTYPES1 REXX Wordpos 17 !27 ! 1.14 7! 0 0.00! 1.14 4! 1.36 !X94214 REXX Wordpos 16 !32 ! 1.00 6! 44 1.10! 2.10 7! 1.23 !X95112 Stack + XEDIT Sort 15 !31 ! 0.99 6! 0 0.00! 0.99 3! 1.03 !WIM REXX Wordpos 14 !28 ! 0.96 6! 0 0.00! 0.96 3! 1.03 !X95113 REXX Wordpos 13 ! 9 ! 0.93 6! 0 0.00! 0.93 3! 1.54 !X (Ivo) REXX Stems 12 !22 ! 0.88 6! 0 0.00! 0.88 3! 0.95 !X94213 REXX Wordpos 11 !24 ! 0.88 6! 0 0.00! 0.88 3! 0.99 !X94207 REXX Wordpos 10 !O2 ! 0.77 5! 0 0.00! 0.77 3! 1.79 !QTYPES2 REXX Stems 9 !O3 ! 0.76 5! 0 0.00! 0.76 3! 1.19 !QTYPES25 REXX Stems+storage 8 !10 ! 0.61 4! 0 0.00! 0.61 2! 1.37 !X94102 Pipe 7 !23 ! 0.52 3! 0 0.00! 0.52 2! 0.59 !X94217 REXX Stems 6 !14 ! 0.48 3! 0 0.00! 0.48 2! 0.99 !X94107 Pipe 5 !12 ! 0.46 3! 0 0.00! 0.46 2! 2.26 !X94104 Pipe 4 !11 ! 0.44 3! 0 0.00! 0.44 1! 1.24 !X94112 Pipe 3 !13 ! 0.44 3! 0 0.00! 0.44 1! 0.86 !X94105 Pipe 2 !O5 ! 0.43 3! 0 0.00! 0.43 1! 0.84 !QTYPES35 Pipe 2 sorts 1 !O4 ! 0.42 3! 0 0.00! 0.42 1! 0.62 !QTYPES3 Pipe
What are the main conclusions ?
If you need some publicity for the REXX compiler, look at next table where the procedures are compiled:
Rank! R ! TCPU Gain-!SIO IO ! Servt !Elapsed !Command Method ! U ! time ed ! time ! time ! time ! ! N !(Sec) % !nbr (Secs) !(Secs) ! (Secs) ! ----+---+-----------+-----------+-------+--------+--------------------------- 17 ! 1 ! 2.15 27 ! 90 2.25! 4.40 ! 4.23 !X94109 Filelist 16 ! 2 ! 1.76 44 ! 94 2.35! 4.11 ! 4.75 !X94106 Filelist 15 ! 3 ! 1.44 40 ! 82 2.05! 3.49 ! 4.86 !X94113 Filelist 18 ! 4 ! 1.32 36 ! 998 24.95! 26.27 ! 8.24 !X94111 Workfile & SORT 19 ! 5 ! 4.79 65 !1042 24.78! 29.67 ! 8.05 !X94101 Workfile & SORT 14 ! 6 ! 1.53 70 ! 49 1.23! 2.76 ! 4.27 !X94110 Workfile 13 ! 7 ! 1.32 58 ! 0 0.00! 1.32 ! 2.49 !X94114 Rexx Wordpos 12 ! 8 ! 1.26 56 ! 0 0.00! 1.26 ! 1.94 !X94108 Rexx Wordpos 10 ! 9 ! 0.93 41 ! 0 0.00! 0.93 ! 1.54 !X (Ivo) Rexx stems 7 !10 ! 0.61 2 ! 0 0.00! 0.61 ! 1.37 !X94102 Pipe 3 !11 ! 0.44 5 ! 0 0.00! 0.44 ! 1.24 !X94112 Pipe 5 !12 ! 0.46 9 ! 0 0.00! 0.46 ! 2.26 !X94104 Pipe 4 !13 ! 0.44 11 ! 0 0.00! 0.44 ! 0.86 !X94105 Pipe 6 !14 ! 0.48 8 ! 0 0.00! 0.48 ! 0.99 !X94107 Pipe 11 !15 ! 1.20 55 ! 0 0.00! 1.20 ! 2.10 !QTYPES1 Rexx Wordpos 9 !16 ! 0.77 36 ! 0 0.00! 0.77 ! 1.79 !QTYPES2 Rexx stems 8 !17 ! 0.76 37 ! 0 0.00! 0.76 ! 1.19 !QTYPES25 Rexx stems 2 !18 ! 0.43 2 ! 0 0.00! 0.43 ! 0.62 !QTYPES3 Pipe 1 !19 ! 0.42 0 ! 0 0.00! 0.42 ! 0.84 !QTYPES35 Pipe
We compiled the solutions of previous course and made new runs. As the first column indicates, the relative ranking didn't change much, and that's why we didn't compile all solutions. For the CPU column, we now indicate the relative gain brought by the compiler.
No surprise that there is almost no difference for the CMS Pipelines solutions, as for REXX most of the procedure is one long command.
What's more surprising is that compiling QTYPES25 (optimized REXX coding) shows less gain than compiling QTYPES1 (less optimized). This shows that the compiler is clever enough to optimize when 'heavy' REXX functions are used.
At last, we ran our (not compiled) procedures for all files on all disks (12949 files, 458 different filetypes) and then we see that the difference between the solutions using wordpos() and the others become really apparent.
R V.cpu T.cpu SIO IO Service Elapsed Page Page ! Comment/Command U time time time time time rds wrts ! N (Secs) (Secs) nbr (Secs) (Secs) (Secs) ! --------------------------------------------------------+---------------------- 1 18.79 18.93 0 0.00 18.93 107.93 11 7 QTYPES1 Rexx Wordpos 2 3.53 3.58 0 0.00 3.58 19.69 1 F 0 QTYPES2 Rexx stems 3 3.50 3.55 0 0.00 3.55 12.47 0 0 QTYPES25 Rexx stems 4 1.96 1.98 0 0.00 1.98 8.60 4 0 QTYPES3 Pipe 5 1.95 1.97 0 0.00 1.97 7.14 0 0 QTYPES35 Pipe 2 sorts
Footnotes:
(1) We quote a paragraph from the book "VM and the VM Community : Past, Present and Future", written by Melinda Varian from the Princeton University :
One of the programmers at Lincoln during that period was Jim
March. (Before I go any further with this story, I must assure you that
Jim is a very good programmer). One evening in 1968, Jim was working
late and found he needed to sort a list of a few hundred items, so he
threw together a "quick and dirty" sort/merge program and sorted his
list and forgot all about it. Shortly after that, he left Lincoln to go
to IDC and didn't have the opportunity to use a "vanilla" CMS again until
1977, when he moved to Bank of America. The Bank sent him to a GUIDE
meeting at which there was much complaining about CMS SORT, so when he
got home he printed off a listing and sat down to take a look. To his
horror, he immediately recognized that he was the author of the reviled
CMS SORT command. He was so embarrassed that he wrote a good CMS SORT
and distributed it to all the members of the GUIDE VM Project...
Back to text
(2)
By coding NBR.=0, REXX will have a default for all
tails. Example. NBR.LISTFILE will be 0 initially.
Back to text
(3)
The REXX Compiler tries to optimize this case ; when word(...,i)
is used in a loop, the compiler will remember where it found the
word in the string, and start scanning from that position at the next
iteration.
Back to text