CMS Pipelines Telecourse. Solution to Exercise 5.

This is a large document, but it is worth reading from A to Z. You may consider printing it for later reference too.

The assignment

Write a procedure that lists the different filetypes found on an accessed disk or SFS directory, and also displays the frequency at which they are found. The output we expect is thus something like this:

   There are 7 files with filetype EXEC on your B-disk
   There are 13 files with filetype XEDIT on your B-disk
   ...

The procedure should accept a filemode as parameter (so don't hard-code a filemode !). We also explicitly ask you not to use CMS Pipelines but to search for the best solution using REXX logic (this is a very good exercise to practice the possibilities of REXX).

Test your procedure with a disk containing only a few files so that you can easily verify your results via a FILELIST. Then, stress test it on a larger disk (e.g. the Y-disk).

General comments

Sorting with REXX

Several students sorted the filetypes. Sometimes, this was to have a nicer output, but in other cases, the sort served the logic. Indeed, by sorting the filetypes, it was easier to test for the different filetypes. If you look for example at our solution, you'll see that the sorting is not needed to produce a correct result.

It is a pity that REXX doesn't provide a SORT function. Sorting can be done via external CMS functions, such as

CMS Pipelines is the easiest, and fast too. You may remember from now on, that if you have, for example, to sort a REXX array, you can use:

If, for any reason (e.g. portability) you want to sort using pure REXX functions, then you could use the SORT routine listed here (it uses the Quick Sort recursive algorithm). For a very small number of elements, it's fast, but once the number augments, it becomes rapidly very CPU consuming.

REXX Sort routine
/**********************************************************************/ / Subroutine SORT / / Function : Sort a REXX array called ARRAY. / /*********************************************************************/ SORT: procedure expose array. stack.1=1 ; stack.2=array.0 ; top=2 do until top=0 last=stack.top ; top=top-1 ; first=stack.top ; top=top-1 ; i=first do until first>=last j=last ; k=trunc((first+last)/2) ; dividingline=array.k do until i>j do while array.idividingline ; j=j-1 ; end if i<=j then do temporary=array.i ; array.i=array.j ; array.j=temporary i=i+1 ; j=j-1 end end / do until i>j / if first=last / end /* do until top=0 */ return

REXX Sort routine

/***********************************************************************/
/* Subroutine SORT                                                     */
/* Function : Sort a REXX array called ARRAY.                          */
/***********************************************************************/
SORT: procedure expose array. 
stack.1=1 ; stack.2=array.0 ; top=2
do until top=0
   last=stack.top ; top=top-1 ; first=stack.top ; top=top-1 ; i=first
   do until first>=last
      j=last ; k=trunc((first+last)/2) ; dividingline=array.k
      do until i>j
         do while array.idividingline ; j=j-1 ; end
         if i<=j then do
            temporary=array.i ; array.i=array.j ; array.j=temporary
            i=i+1 ; j=j-1
         end
      end /* do until i>j */
      if first=last */
end /* do until top=0 */
return

To use it, have the elements you want to sort in a REXX stem with name ARRAY. and then call sort. On return, the array will be sorted. Total CPU consumption is exponential, and may go from 0.02 seconds for 10 elements to 0.59 seconds for 650 elements and 10.2 seconds for 8950 elements. Compiling triples the performance.

One could also use XEDIT to sort. It is fast too, surely faster than pure REXX for larger lists. This is an example on how to use XEDIT' SORT:

/* This exec is a sample showing how one can use XEDIT to SORT */
address command
/*1 ---Get data to sort in STACK --------------------------------------*/
   'LISTFILE * EXEC A (STACK'          /* We want the filenames sorted */

/*2 ---Stack commands to make XEDIT place stacked data in file --------*/
   push 'INPUT'              /* tell XEDIT to place stack in file      */
   queue ''                  /* to stop XEDIT reading from stack       */
   push '-* DEL *'           /* but make file empty first              */

/*3 ---Stack command to SORT the data and STACK them afterwards -------*/
   queue '-* DMSXMS D 1 8'   /* Call XEDIT's SORT module on cols 1-8   */
   queue 'STACK *'           /* Ask XEDIT to stack the sorted data     */
   queue 'COMMAND QUIT'      /* And XEDIT can QUIT the file            */

/*4 ---Call XEDIT, it'll handle the stacked lines ---------------------*/
   'XEDIT DUMB FILE S (NOPROF NOMSG' /* use a dummy file               */

/*5 ---Here we have our sorted data -----------------------------------*/
   do queued()
      parse pull fid ; say fid
   end

File Identifiers

On VM/CMS, file identifiers cannot contain blanks, but on personal systems, this is possible. When you parse the strings to find the filetypes (extensions to use the correct term), you can't work with word parsing, but need to use positional parsing.

Test case

When you test your program, you need sometimes to provide a specific test case. When we ran this course, we used a disk containing following files:

Benchmarks

At the end of this chapter, we provide our solutions and a benchmark that compares them with different other solutions. You should review this as it will once more indicate that it is possible to drastically augment the performance of your procedures by using the good programming techniques.

Grammar with REXX

Some students included extra logic in their procedures to make a distinction in the output messages when there is only one file with a certain filetype (singular) as opposed to the plural when there are more files with a filetype. That's nice. Their logic resembles to this :

This can be shorted if you use a programming trick we learned from John Hartmann (the creator of CMS Pipelines), although, it's complex programming style, we must admit:

This technique proves to be useful in other circumstances. For example, when we want to calculate the number of days since Jan 1, 1900 and to take the leap years into account :

   /* Calculate days since Jan 1, 1900 */
   DAYS_SINCE_1900:
   Parse arg yy'/'mm'/'dd
   Days.01=  00; Days.02=  31; Days.03=  59; Days.04=  90
   Days.05= 120; Days.06= 151; Days.07= 181; Days.08= 212
   Days.09= 243; Days.10= 273; Days.11= 304; Days.12= 334
   now_to_1900 = (yy-1)*365 + (yy-1)%4 + days.mm + dd + (yy//4=0)*(mm>2)
   return now_to_1900

The last operand in the calculation adds 1 to the number of days if we're having a leap year (modulo 4) and if we're past the month February.

Parsing

The exercise was meant as a combination of File I/O and parsing. We'd like to come back to the solution we gave for the last exercise of lesson 2 (the MULTILIN procedure), and analyze this with you in detail now:
Having collected all one-lined records in an array rec., we issue following statements to split the records and to build the output array :

 nr = 0                                    /* number of single records */
 do i=1 to rec.0
    do while rec.i<>''
       parse value nr+1 rec.i with nr . 0 out.0 out.nr ';' rec.i
    end
 end

What do we achieve by this technique ? Well, this single parse statement allows us to:

First time we ran this course, one of our students used this technique in an even more elaborated technique when coding its solution. It was Ivo Toelen of Kredietbank (now KBC). We want to give him again the full praise he deserves, and to be honest, he taught your instructors to push the limits of the parse instruction even further. We therefore repeat here the details of his solution, so that you can learn from it too.

Ivo's code resumes to this (after having put the result of a LISTFILE in an array LST.) :

 1 !  TB.  = 0   /* Table of counters, indexed on filetype */
 2 !  TBN. = 0   /* Table of existing filetypes */
 3 !  do i = 1 to lst.0
 4 !     parse var lst.i . ft . 
 5 !     parse value TB.ft + 1 with new . 1 TB.ft . 
 6 !     if new = 1 then       /* add the new filetype to the TBN table */
 7 !        parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex .
 8 !  end
 9 !  trc=trace('r')
10 !  do i = 1 to TBN.0
11 !      say "There are" right(value('TB.'TBN.i),8),
   !          "files with type '"left(TBN.i,8)"' on filemode" fm
12 !  end

then, a trace of the procedure shows (we eliminated some irrelevant lines for editorial reasons):

     3 *-* do i = 1 to lst.0
     4 *-*  parse var lst.i . ft . 
       >>>    "EXEC"
     5 *-*  parse value TB.ft + 1 with new . 1 TB.ft . 
       >C>    "TB.EXEC"
       >V>    "0"
       >L>    "1"
       >O>    "1"
       >>>    "1"
       >.>    ""
       >C>    "TB.EXEC"
       >>>    "1"
     6 *-*  if new = 1 then /* add the new filetype to the TBN table */
       >O>    "1"
     7 *-*   parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex .
       >V>     "0"
       >L>     "1"
       >O>     "1"
       >V>     "EXEC"
       >O>     "1 EXEC"
       >>>     "1"
       >.>     "EXEC"
       >>>     "1"
       >C>     "TBN.1"
       >>>     "EXEC"
     8 *-* end
     3 *-* do i = 1 to lst.0
     4 *-*  parse var lst.i . ft . 
       >>>    "GLOBALV"
     5 *-*  parse value TB.ft + 1 with new . 1 TB.ft . 
       >C>    "TB.GLOBALV"
       >V>    "0"
       >L>    "1"
       >O>    "1"
       >>>    "1"
       >C>    "TB.GLOBALV"
       >>>    "1"
     6 *-*  if new = 1 then /* add the new filetype to the TBN table */
       >O>    "1"
     7 *-*   parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex . 
       >V>     "1"
       >L>     "1"
       >O>     "2"
       >V>     "GLOBALV"
       >O>     "2 GLOBALV"
       >>>     "2"
       >.>     "GLOBALV"
       >>>     "2"
       >C>     "TBN.2"
       >>>     "GLOBALV"
     8 *-* end
     3 *-* do i = 1 to lst.0
     4 *-*  parse var lst.i . ft . 
       >>>    "EXEC"
     5 *-*  parse value TB.ft + 1 with new . 1 TB.ft . 
       >C>    "TB.EXEC"
       >V>    "1"
       >L>    "1"
       >O>    "2"
       >>>    "2"
       >C>    "TB.EXEC"
       >>>    "2"
     6 *-*  if new = 1
       >O>    "0"
     8 *-* end
     3 *-* do i = 1 to lst.0
     9 *-* trc=trace('r')
    10 *-* do i = 1 to TBN.0
    11 *-*  say "There are" right(value('TB.'TBN.i),8) "files with typ
e '"left(TBN.i,8)"' on filemode" fm
There are        2 files with type 'EXEC    ' on filemode T
    12 *-* end
    ...

After having read the file-ids in a REXX stem and initialized two REXX stems TB. and TBN. to 0, the file-ids are processed one by one.

>C> "TB.EXEC"	as ft='EXEC', TB.ft becomes TB.EXEC
>V> "0"	as the array was initialized to default 0, TB.EXEC gets a value of 0.
>L> "1"	Then we add literal value 1
>O> "1"	to become "1" as the source for the parse.
>>> "1"	As a result, new gets the value 1.
>.> ""	dot as placeholder to remove leading and trailing blanks (if any)
>C> "TB.EXEC"	and after backstepping in the template,
>>> "1"	element EXEC of array TB. is also set to value 1

Thus, new and TB.EXEC both have value 1. The test at line 6 is thus true, and we come to analyze line 7:

>V> "0"	TBN.0 has indeed value 0 due to initialization of the array.
>O> "1"	so, adding 1 gives 1 as output
>V> "EXEC"	and as the filetype is EXEC
>O> "1 EXEC"	the source for the parse becomes "1 EXEC".
>>> "1"	Thus, newindex gets the first word, (1)
>.> "EXEC"	the placeholder gets the rest
>>> "1"	TBN.0 gets also value 1 due to backstepping.
>C> "TBN.1"	and, as newindex just became 1, TBN.1 is a new array element, and...
>>> "EXEC"	it gets the second word of the source (EXEC).

So, for each new filetype, a new element is added to the TBN. array, and the TBN.0 element increments too so that we know how many elements the array contains.

When the second file is analyzed (filetype GLOBALV) the first parse does exactly the same as above for the EXEC filetype (as it is the first time we encounter this filetype, a new TBN. element is created). The second parse then gives:

>V> "1"	TBN.0 had value 1 after first loop, thus...
>O> "2"	the sum is now 2.
>O> "2 GLOBALV"	Our source string becomes thus "2 GLOBALV"
>>> "2"	newindex gets value 2, and
>>> "GLOBALV"	TBN.2 gets value GLOBALV.

We had two different filetypes till now, and so we have two elements in our TBN. array. The third file is a filetype EXEC again. The first parse sets both new and TB.EXEC to a value of 2 (which is indeed the number of EXEC filetypes we encountered). Consequently, the if condition at line 6 is false and no new element is added to the TBN. array.

The first loop ends, and we are ready to analyze the second one. It produces the output at the terminal. The only tricky thing here is the value() function. These are the intermediate interpretation steps in the first iteration:

             value('TB.'TBN.i)
  becomes    value('TB.'"EXEC")   as i=1 and TBN.1="EXEC"
  becomes    value(TB.EXEC)       after concatation of the strings
  becomes    2                    as TB.EXEC is 2

Note the quotes around TB. in the value() function. This is needed, as otherwise REXX would try to find the value of the variable TB.TBN.1.

Some typical solutions.

As this is definitely a key exercise of the telecourse, we add comments to several solutions we got from our former students.

   ! /*would normally contain author info etc */
 1 ! Address COMMAND
 2 ! 'MAKEBUF'
 3 ! parse upper arg fm .
 4 ! types = ''
 5 ! if fm = '' then fm = 'A'
 6 ! 'LISTFILE * *' fm '(STACK'
 7 ! if rc ¬= 0 then exit rc
 8 ! x=0
 9 ! do queued()
10 !  parse pull . type .
11 !  if wordpos( type, types) = 0 then types=types type
12 !  x=x+1
13 !  ftype.x = type
14 ! end
15 ! 'DROPBUF'
16 ! do loop = 1 to words(types)
17 !    count = 0
18 !    parse var types thistype types
19 !    do z = 1 to x
20 !       if thistype = ftype.z  then
21 !          count = count+1
22 !    end
23 !    say 'There are 'count' file(s) of type 'thistype' on your 'fm'-Disk'
24 ! end
25 ! return

This procedure produces a correct output, but will be one of the worst performers in our benchmark later on. This is certainly due to the heavy use of the REXX functions word() and wordpos() and the general logic. At each iteration in the second loop, the number of words in the string has to be re-calculated. Furthermore, a large array is built in order to remember the filetypes and to be able to count them later in a double-loop. Our solutions show better techniques not that different from this after all. It means that some further analysis can drastically improve the performance of a procedure.

And, if a general exit routine had been used, it would become easier to give feedback that is more complete (e.g. for line 7).

   ! /************************************************************/
   ! /* The Filename is QTYPES                            ********/
   ! /* Function     : Count FILETYPES                    ********/
   ! /************************************************************/
 1 ! parse upper arg fm .
 2 ! if fm = '' then fm = 'A'
 3 ! queue 'FT FILELIST'
 4 ! queue 'FN TEMP'
 5 ! queue 'SORT * 16 23'
 6 ! queue 'FFILE'
 7 ! address command 'EXEC FILELIST * * 'fm
 8 ! address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST'
 9 ! rcexecio = rc
10 ! parse var flist . fti .
11 ! do while rcexecio = 0
12 !    i = 1
13 !    address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST'
14 !    rcexecio = rc
15 !    parse var flist . fto .
16 !    do while fti = fto & rcexecio = 0
17 !       address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST'
18 !       rcexecio = rc
19 !       parse var flist . fti .
20 !       i = i + 1
21 !    end;
22 !    if i > 1 then say i' files with filetype 'fto' on 'fm'-DISK.'
23 !             else say '1 file with filetype 'fti' on 'fm'-DISK.'
24 !    if i = 1 then fti = fto;
25 ! end
26 ! address command 'ERASE TEMP FILELIST A'

   ! /**********************************************************************/
   ! /* QTYPES - Created by Rudi                                           */
   ! /* Function : count the different filetypes on an accessed disk       */
   ! /* Parameters : filemode                                              */
   ! /**********************************************************************/
 1 ! address command
 2 ! parse source . . myname .
 3 ! parse upper arg fm
 4 ! if fm='' then fm='A'
 5 ! address '' 'ESTATE * * 'fm
 6 ! if RC¬=0 then call Error_Exit rc,' 'fm' is not accessed'
 7 ! address CMS 'LISTFILE * * 'fm' (exec'
 8 ! if RC¬=0 then call Error_Exit rc,' No A disk accessed in R/W'
 9 ! set cmstype ht
10 ! queue '16 25'
11 ! address '' 'SORT CMS EXEC A CMS TEMP A'
12 ! address '' 'ERASE CMS EXEC A '
13 ! set cmstype rt
14 ! 'EXECIO * DISKR CMS TEMP A (FINIS margins 17 25 STEM FT.'
15 ! x=0;ftc.x=0
16 ! do i=1 to ft.0 by 1
17 !    if ft.i=fto then do             /* did we have already such FT ? */
18 !       ftc.x=ftc.x+1                /* increment counter  */
19 !    end
20 !    else do                         /* New FT */
21 !       x=x+1                        /* prepare next values */
22 !       ftc.x=0                      /* idem */
23 !       ftc.x=ftc.x+1                /* increment new counter */
24 !       ft.x=ft.i                    /* set new FT */
25 !    end
26 !    fto=ft.i
27 ! end
28 ! do y=1 to x by 1
29 !   say 'you have 'ftc.y' files with filetype 'ft.y' on the disk accessed',
   !      'as 'fm
30 ! end
31 ! say 'Total : 'ft.0' files '
32 ! say 'Sorry, no PARSE used, where could i have used it ????'
33 ! say 'What did i miss ???, could it perform better ??'
34 ! address '' 'ERASE CMS TEMP A'
35 ! call Error_Exit 00,'All done'
   ! /****************** Exit routine ********************************/
36 ! Error_Exit:
37 !  parse arg retc,errmsg
38 !  if errmsg<>'' then say myname':' errmsg
39 !  exit retc

This solution produces correct output, is clean but has a rather complex programming style.

   ! /* ========================================================== */
   ! /* QTYPES   - created by: Fred Flinstone                      */
   ! /* Function: list # of files with filetypes                   */
   ! /* Parameters: filemode (default: A)                          */
   ! /* Options: none                                              */
   ! /* ========================================================== */
 1 ! address command
 2 ! parse upper arg fm .
 3 ! if fm = '' then fm = 'A'
 4 ! 'ERASE CMS EXEC A'
 5 ! 'ERASE $$CMS$$ $$EXEC$$ A'
 6 ! 'LISTFILE * * ' fm '(FN FT EXEC'
 7 ! if rc¬=0 then do
 8 !    say 'disk' fm 'not currently accessed'
 9 !    exit 1
10 ! end
11 ! queue '17 24'
12 ! 'PIPE COMMAND SORT CMS EXEC A $$CMS$$ $$EXEC$$ A'
13 ! parse value linein($$CMS$$ $$EXEC$$ A) 1 with . . . ftk nfk .
14 ! do until lines($$CMS$$ $$EXEC$$ A) = 1
15 !    parse value linein($$CMS$$ $$EXEC$$ A) with . . . ft .
16 !    if ft = ftk then nfk = nfk + 1
17 !    else do
18 !         if nfk = 1 then
19 !            say 'there is 1 file with filetype' ftk 'on disk' fm
20 !         else
21 !            say 'there are' nfk 'files with filetype' ftk 'on disk' fm
22 !         parse value 1   ft  ,
23 !               with  nfk ftk
24 !    end
25 ! end
26 ! exit

We appreciate that at least one student tried to use the linein() function for the method. Lesson 4 will show that this is not the best performer, but at least we have an extra example now. It also partly explains why it is a bad performer (see benchmark at the end of the chapter). We have to blame the authors of LINEIN, not the student. The benchmark will also show why it is not good to work with a temporary file.

   ! /*******************************************************************/
   ! /* QTYPES   - Created by Valerie                                   */
   ! /*******************************************************************/
 1 ! address command
 2 ! arg fmode .
 3 ! 'MAKEBUF'
 4 ! 'LISTFILE * *' fmode '(STACK FT'
 5 ! if rc=36 then exit
 6 ! x=queued()
 7 ! if fmode='' then fmode='A'
 8 ! parse value '  ' with f t
 9 ! do i=1 to x
10 !    pull fname stem.i
11 !    f=f stem.i
12 ! end
13 ! do i=1 to x
14 !    parse value '0 1' with r j
15 !    if wordpos(stem.i,t)¬=0 then iterate
16 !    do until y=0
17 !       y=wordpos(stem.i,f,j)
18 !       if y¬=0 then do
19 !          r=r+1
20 !          j=y+1
21 !       end
22 !       if y=0 & r>1 then t=t stem.i
23 !    end
24 !    say 'There are' r 'files with filetype' stem.i 'on your' fmode'-disk'
25 ! end
26 ! 'DROPBUF'
27 ! exit

This solution produces incorrect output and is a bad performer, but shows a clean programming style.

   ! /*******************************************************************/
   ! /* QTYPES   - Created by Wim                                       */
   ! /* Function : Count filetypes                                      */
   ! /* Parameters : filemode                                           */
   ! /*******************************************************************/
 1 ! address command
 2 ! parse upper source . . myname mytype . syn.
 3 ! parse upper arg fm .
 4 ! if fm = '' then fm = 'A'
 5 ! 'DROPBUF'
 6 ! 'LISTFILE * *' fm '(NOHEADER STACK' /* listfile of the filemode fm */
 7 ! if rc <> 0 then call errexit rc,'Problem with listfile * *' fm
 8 ! ft_array = ''
 9 ! aantal_ft. = 0
10 ! do queued()
11 !   parse pull . ft .                            /* get the filetype */
12 !   ft_pos = wordpos(ft,ft_array)     /* is it already in the list ? */
13 !   if ft_pos = 0 then do                            /* not it isn't */
14 !     ft_array = ft_array ft                /* so add it to the list */
15 !     ft_pos = words(ft_array) /* what is the position in the list ? */
16 !   end
   !                               /* increment element in array with 1 */
17 !   aantal_ft.ft_pos = aantal_ft.ft_pos + 1
18 ! end
19 ! do i=1 to words(ft_array)                          /* print result */
20 !   if aantal_ft.i = 1 then
21 !     say 'there is 1 file with filetype' word(ft_array,i) 'on your',
22 !        fm'-disk.'
23 !   else
24 !     say 'there are' aantal_ft.i 'files with filetype' word(ft_array,i),
25 !        'on your' fm'-disk.'
26 ! end
27 ! exit
   !
28 ! ERREXIT: /* exit with errormessage(s) */
29 !  do n=2 to arg()                  /* get all errormessages to show */
30 !    say myname ':' arg(n)                  /* give the n'th message */
31 !  end n
32 ! exit arg(1)

This program produces correct output, has a clean programming style and has an acceptable performance.

Our solutions and benchmark.

The very first try.

/*******************************************************************/
/* Procedure QTYPES1  List filetype occurrences on any disk        */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */
oldqueue=queued() ; 'MAKEBUF' ; buffer=rc     /* make stack buffer */
filetypes=''                                  /* initialize string */
'LISTFILE * * 'mode' (STACK'                  /* issue the command */
if rc<>0 then call errexit rc,'Error reading filelist of disk 'mode
total=queued()-oldqueue                   /* total number of files */
do queued()-oldqueue
   parse pull . ft .                    /* get filetype from stack */
   if wordpos(ft,filetypes)<>0 then nbr.ft=nbr.ft+1   /* increment */
                               else do         /* if not yet found */
      filetypes=filetypes ft                      /* add to string */
      nbr.ft=1                               /* initialize counter */
   end
end
say 'From a total of 'total' files on disk' mode '...'
l=length(total)
do i=1 to words(filetypes)                       /* perform output */
   ft=word(filetypes,i)
   say right(nbr.ft,l) 'have a type' ft
end
call exit 0
/* Subroutines ** you know our ERREXIT routine by now isn't it ? ***/

Our first solution uses the technique whereby we build a string that contains all different filetypes, but instead of using an array with numeric indices (1 to n) for counting the occurrences of filetypes, we build an array whereby the elements (the tails of the stem) are the filetypes themselves.

This means, we build an array where the elements could be nbr.MODULE, nbr.EXEC, nbr.XEDIT, nbr.SCRIPT, etc., and these will contain the number of times a filetype is present on the disk.

The main advantage of this technique is that, while counting, we don't have to look up the filetype in the filetypes string. When we analyze a filetype, its name is the index in our array immediately.

We have however still to build up the filetypes string in order to be able to run the second loop producing the output.

You should note that REXX accepts anything as index (tail) for a stem, even a string containing blanks or special characters ! In addition, this can give problems. If we had coded the first loop like this:

do queued()-oldqueue
   parse pull 10 ft +8                      /* get filetype from stack */
   if wordpos(ft,filetypes)<>0 then nbr.ft=nbr.ft+1       /* increment */
                               else do             /* if not yet found */
      filetypes=filetypes ft                          /* add to string */
      nbr.ft=1                                   /* initialize counter */
   end
end

then, the second loop wouldn't find back the correct indices anymore, as the parse will create filetypes with a length of 8 characters (so, with trailing blanks). In that case we would need to add a ft=left(ft,8) in the second loop to make it a string of exactly 8 characters.

Improving performance

For each filetype, REXX has to compare it to each word in the variable filetypes ; when this list grows, it can cost quite a lot of CPU when REXX has to scan it sequentially.

If you accept that REXX uses a much faster binary search algorithm when it has to find one of its variables, then it is possible to start looking for improvements.

nbr.=0                                         /* Initialize array */
do queued()-oldqueue
   parse pull . ft .
   if nbr.ft=0 then filetypes=filetypes ft /* add new file to list */
   nbr.ft=nbr.ft + 1                       /* increment counter    */
end

Instead of looking up the large filetypes string, we initialize the array before the loop(footnote 2), and we simply test whether the element of the array is greater than zero or not.

For example, the very first time a filetype LISTING is encountered, the element nbr.LISTING will be 0. So, we then assign it a value 1. For next occurrences, the array element will not be 0 anymore, and we just add 1 to the counter but don't add the filetype to the list anymore.

But, this second loop is also negatively impacted by the use of the words() and word() functions. The words() function forces REXX to scan the whole string at each iteration, while for word(), it depends ; obviously, the first word is found immediately, but to find the thousandth word, REXX has again to scan for 999 blanks...(footnote 3)

So, let's replace it by the parsing the string while eating it technique that we learned in the lesson. We implemented this in our second solution, and, as you can see, the procedure did not loose readability due to our performance improvements. It gave a gain of 36% for the test run on the Y-disk.

Please, don't conclude from this that you shouldn't use the REXX functions at all. Remember, in order of priority:

/*******************************************************************/
/* Procedure QTYPES2  List filetype occurrences on any disk        */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */
oldqueue=queued() ; 'MAKEBUF' ; buffer=rc     /* make stack buffer */
filetypes=''                                  /* initialize string */
nbr.=0                                            /* ftype counter */
'LISTFILE * * 'mode' (STACK'                  /* issue the command */
if rc<>0 then call errexit rc,'Error reading filelist of disk 'mode
total=queued()-oldqueue                   /* total number of files */
do queued()-oldqueue
   parse pull . ft .                    /* get filetype from stack */
   if nbr.ft=0 then filetypes=filetypes ft      /* append new file */
   nbr.ft=nbr.ft+1                            /* increment counter */
end
say 'From a total of 'total' files on disk' mode '...'
l=length(total)
do while filetypes^=''                           /* perform output */
   parse var filetypes ft filetypes                /* eat our list */
   say right(nbr.ft,l) 'have a type' ft
end
call exit 0
/* Subroutines ** you know our ERREXIT routine by now isn't it ? ***/

Another improvement we applied to this classic REXX coding has to deal with storage management. With all operating systems, storage management is somehow costly. On VM systems, we know REXX does the following:

So, in our procedure, we have the variable filetypes that grows continuously. We can help REXX if we first pre-allocate a large variable that is filled in afterwards. The gains are measurable both for the interpreter as for the compiler, but of course depends on the number of times the variable grows in the loop.

/*******************************************************************/
/* Procedure QTYPES25 : Using storage pre-allocation.              */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */
oldqueue=queued() ; 'MAKEBUF' ; bufno=rc      /* make stack buffer */
filetypes=left('',1000)                       /* initialize string */
filetypes=''                                  /* initialize string */
nbr.=0                                            /* ftype counter */
'LISTFILE * * 'mode' (STACK'                  /* issue the command */
if rc<>0 then call emsg rc,'Error reading filelist of disk 'mode
total=queued()-oldqueue                   /* total number of files */
do queued()-oldqueue
   parse pull . ft .                    /* get filetype from stack */
   if nbr.ft=0 then filetypes=filetypes ft      /* append new file */
   nbr.ft=nbr.ft+1                            /* increment counter */
end
say 'From a total of 'total' files on disk' mode '...'
l=length(total)
do while filetypes^=''                           /* perform output */
   parse var filetypes ft filetypes                /* eat our list */
   say right(nbr.ft,l) 'have a type' ft
end
call exit 0

We couldn't resist searching for a solution using CMS Pipelines. This solution, allows us to produce a sorted output. To make a distinction between singular and plural is not so easy however, and therefore not included in this pipeline.

/*******************************************************************/
/* Procedure QTYPES3  Solution using PIPE                          */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */

 'PIPE COMMAND LISTFILE * *' mode '(FT HEADER',
    '!VAR MSG',                        /* Header or Errmsg to REXX */
    '!DROP 1',                            /* drop Header or Errmsg */
    '!SPEC Word 2 1',                           /* take ftype only */
    '!SORT 1.8',                                  /* sort on Ftype */
    '!UNIQUE COUNT LAST',                      /* now count Ftypes */
    '!SPEC 1.10 2 /have a type/ Nextword 11-* Nextword',
    '!Literal On disk' mode '...',               /* add headerline */
    '!CONSOLE'

if rc^=0 then Say msg                  /* show LISTFILE's errormsg */
exit rc

We could even go a bit further and sort the output a second time to list the filetypes by usage-frequency.

/*******************************************************************/
/* Procedure QTYPES35 : Using PIPE and sorting twice.              */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */

 'PIPE COMMAND LISTFILE * *' mode '(FT HEADER',
    '!VAR MSG',                        /* Header or Errmsg to REXX */
    '!DROP 1',                            /* drop Header or Errmsg */
    '!SPEC Word 2 1',                           /* take ftype only */
    '!SORT 1.8',                                  /* sort on Ftype */
    '!UNIQUE COUNT LAST',                      /* now count Ftypes */
    '!SORT 1.10 D',                    /* sort most occuring first */
    '!SPEC 1.10 2 /have a type/ Nextword 11-* Nextword',
    '!Literal On disk' mode '...',               /* add headerline */
    '!CONSOLE'

if rc^=0 then Say msg                  /* show LISTFILE's errormsg */
exit rc

Finally, we were interested in analyzing the performance of all the different solutions. We used the BENCH goodie as described in Appendix C. We ran the procedures four times for the goodies disk containing 685 files at that time.

We added results from several other solutions, so that we have an even more representative set.

We confess that we sometimes don't compare the same things. Some students made for example a distinction between unique filetypes and multiple filetypes and adapted the output message to the singular or plural version. We think however that this counts only for a minimal overhead as it runs in the second loop with a limited number of iterations.

The first column gives the ranking. It is based on the Service Time. This is a calculated time being the sum of the CPU time and the I/O time. For the I/O time, we count with an average service time of 25 milliseconds.

The last column (before the comments) contains the true Elapsed Time. This time is influenced by the actual performance of the system but as we did the runs one after the other, the variation due to the system load (it was lightly loaded) is minimal. It may surprise you is that for some runs, the Elapsed Time is shorter than the Service Time. This clearly shows what minidisk caching (the I/O Service Time is not 20 ms) can bring to 'bad applications'. If there were no minidisk caching, then the elapsed time would be very near to the calculated service time.

Both for the CPU as for the Service Time, we included a percentage comparing the solutions to the worst performer (which has of course 100%).

Rank! R ! Tot CPU  !SIO    IO  ! Servtime !Elapsed !Command     Method
    ! U ! time  %  !      time !  time  % !   time !
    ! N !(Sec)     !nbr (Secs) !(Secs)    ! (Secs) !
----+---+----------+-----------+----------+--------+---------------------------
 32 !29 ! 15.48 100!   0   0.00! 15.48  52! 16.77  !ANDY     REXX Wordpos
 31 !30 !  5.33  34!   0   0.00!  5.33  18!  6.05  !VALERIE  REXX Wordpos
 30 ! 5 !  4.79  31! 991  24.78! 29.57 100!  8.05  !FRED     Workfile & Sort
 29 !25 !  2.75  18!  92   2.30!  5.05  17!  4.11  !X94204   Workfile & XEDIT so
 28 ! 1 !  2.15  14!  90   2.25!  4.40  15!  4.23  !PHILIPPE Filelist
 27 ! 2 !  1.76  11!  94   2.35!  4.11  14!  4.75  !X94106   Filelist
 26 ! 6 !  1.53  10!  49   1.23!  2.76   9!  4.27  !X94110   Workfile
 25 !20 !  1.44   9!1042  26.05! 27.49  93!  2.71  !X94101   Workfile & Sort
 24 ! 3 !  1.44   9!  82   2.05!  3.49  12!  4.86  !X94113   Filelist
 23 ! 4 !  1.32   9! 998  24.95! 26.27  89!  8.24  !RUDI     Workfile & Sort
 22 ! 7 !  1.32   9!   0   0.00!  1.32   4!  2.49  !X94114   REXX Wordpos
 21 !21 !  1.28   8!  64   1.60!  2.88  10!  6.53  !X94210   Filelist
 20 !26 !  1.27   8!1010  25.25! 26.52  90!  2.41  !X94209   Workfile & Sort
 19 ! 8 !  1.26   8!   0   0.00!  1.26   4!  1.94  !X94108   REXX Wordpos
 18 !O1 !  1.20   8!   0   0.00!  1.20   4!  2.10  !QTYPES1  REXX Wordpos
 17 !27 !  1.14   7!   0   0.00!  1.14   4!  1.36  !X94214   REXX Wordpos
 16 !32 !  1.00   6!  44   1.10!  2.10   7!  1.23  !X95112   Stack + XEDIT Sort
 15 !31 !  0.99   6!   0   0.00!  0.99   3!  1.03  !WIM      REXX Wordpos
 14 !28 !  0.96   6!   0   0.00!  0.96   3!  1.03  !X95113   REXX Wordpos
 13 ! 9 !  0.93   6!   0   0.00!  0.93   3!  1.54  !X  (Ivo) REXX Stems
 12 !22 !  0.88   6!   0   0.00!  0.88   3!  0.95  !X94213   REXX Wordpos
 11 !24 !  0.88   6!   0   0.00!  0.88   3!  0.99  !X94207   REXX Wordpos
 10 !O2 !  0.77   5!   0   0.00!  0.77   3!  1.79  !QTYPES2  REXX Stems
  9 !O3 !  0.76   5!   0   0.00!  0.76   3!  1.19  !QTYPES25 REXX Stems+storage
  8 !10 !  0.61   4!   0   0.00!  0.61   2!  1.37  !X94102   Pipe
  7 !23 !  0.52   3!   0   0.00!  0.52   2!  0.59  !X94217   REXX Stems
  6 !14 !  0.48   3!   0   0.00!  0.48   2!  0.99  !X94107   Pipe
  5 !12 !  0.46   3!   0   0.00!  0.46   2!  2.26  !X94104   Pipe
  4 !11 !  0.44   3!   0   0.00!  0.44   1!  1.24  !X94112   Pipe
  3 !13 !  0.44   3!   0   0.00!  0.44   1!  0.86  !X94105   Pipe
  2 !O5 !  0.43   3!   0   0.00!  0.43   1!  0.84  !QTYPES35 Pipe 2 sorts
  1 !O4 !  0.42   3!   0   0.00!  0.42   1!  0.62  !QTYPES3  Pipe

If you need some publicity for the REXX compiler, look at next table where the procedures are compiled:

Rank! R ! TCPU Gain-!SIO    IO  ! Servt !Elapsed !Command     Method
    ! U ! time   ed !      time !  time !   time !
    ! N !(Sec)   %  !nbr (Secs) !(Secs) ! (Secs) !
----+---+-----------+-----------+-------+--------+---------------------------
 17 ! 1 !  2.15  27 !  90   2.25!  4.40 !  4.23  !X94109      Filelist
 16 ! 2 !  1.76  44 !  94   2.35!  4.11 !  4.75  !X94106      Filelist
 15 ! 3 !  1.44  40 !  82   2.05!  3.49 !  4.86  !X94113      Filelist
 18 ! 4 !  1.32  36 ! 998  24.95! 26.27 !  8.24  !X94111      Workfile & SORT
 19 ! 5 !  4.79  65 !1042  24.78! 29.67 !  8.05  !X94101      Workfile & SORT
 14 ! 6 !  1.53  70 !  49   1.23!  2.76 !  4.27  !X94110      Workfile
 13 ! 7 !  1.32  58 !   0   0.00!  1.32 !  2.49  !X94114      Rexx Wordpos
 12 ! 8 !  1.26  56 !   0   0.00!  1.26 !  1.94  !X94108      Rexx Wordpos
 10 ! 9 !  0.93  41 !   0   0.00!  0.93 !  1.54  !X  (Ivo)    Rexx stems
  7 !10 !  0.61   2 !   0   0.00!  0.61 !  1.37  !X94102      Pipe
  3 !11 !  0.44   5 !   0   0.00!  0.44 !  1.24  !X94112      Pipe
  5 !12 !  0.46   9 !   0   0.00!  0.46 !  2.26  !X94104      Pipe
  4 !13 !  0.44  11 !   0   0.00!  0.44 !  0.86  !X94105      Pipe
  6 !14 !  0.48   8 !   0   0.00!  0.48 !  0.99  !X94107      Pipe
 11 !15 !  1.20  55 !   0   0.00!  1.20 !  2.10  !QTYPES1     Rexx Wordpos
  9 !16 !  0.77  36 !   0   0.00!  0.77 !  1.79  !QTYPES2     Rexx stems
  8 !17 !  0.76  37 !   0   0.00!  0.76 !  1.19  !QTYPES25    Rexx stems
  2 !18 !  0.43   2 !   0   0.00!  0.43 !  0.62  !QTYPES3     Pipe
  1 !19 !  0.42   0 !   0   0.00!  0.42 !  0.84  !QTYPES35    Pipe

We compiled the solutions of previous course and made new runs. As the first column indicates, the relative ranking didn't change much, and that's why we didn't compile all solutions. For the CPU column, we now indicate the relative gain brought by the compiler.

No surprise that there is almost no difference for the CMS Pipelines solutions, as for REXX most of the procedure is one long command.

What's more surprising is that compiling QTYPES25 (optimized REXX coding) shows less gain than compiling QTYPES1 (less optimized). This shows that the compiler is clever enough to optimize when 'heavy' REXX functions are used.

At last, we ran our (not compiled) procedures for all files on all disks (12949 files, 458 different filetypes) and then we see that the difference between the solutions using wordpos() and the others become really apparent.

 R  V.cpu  T.cpu  SIO    IO  Service  Elapsed Page Page ! Comment/Command
 U   time   time        time    time     time  rds wrts !
 N (Secs) (Secs)  nbr (Secs)  (Secs)   (Secs)           !
--------------------------------------------------------+----------------------
 1  18.79  18.93    0   0.00   18.93   107.93   11    7  QTYPES1  Rexx Wordpos
 2   3.53   3.58    0   0.00    3.58    19.69    1  F 0  QTYPES2  Rexx stems
 3   3.50   3.55    0   0.00    3.55    12.47    0    0  QTYPES25 Rexx stems
 4   1.96   1.98    0   0.00    1.98     8.60    4    0  QTYPES3  Pipe
 5   1.95   1.97    0   0.00    1.97     7.14    0    0  QTYPES35 Pipe 2 sorts

(1) We quote a paragraph from the book "VM and the VM Community : Past, Present and Future", written by Melinda Varian from the Princeton University :

One of the programmers at Lincoln during that period was Jim March. (Before I go any further with this story, I must assure you that Jim is a very good programmer). One evening in 1968, Jim was working late and found he needed to sort a list of a few hundred items, so he threw together a "quick and dirty" sort/merge program and sorted his list and forgot all about it. Shortly after that, he left Lincoln to go to IDC and didn't have the opportunity to use a "vanilla" CMS again until 1977, when he moved to Bank of America. The Bank sent him to a GUIDE meeting at which there was much complaining about CMS SORT, so when he got home he printed off a listing and sat down to take a look. To his horror, he immediately recognized that he was the author of the reviled CMS SORT command. He was so embarrassed that he wrote a good CMS SORT and distributed it to all the members of the GUIDE VM Project...
Back to text

(2) By coding NBR.=0, REXX will have a default for all tails. Example. NBR.LISTFILE will be 0 initially.
Back to text

(3) The REXX Compiler tries to optimize this case ; when word(...,i) is used in a loop, the compiler will remember where it found the word in the string, and start scanning from that position at the next iteration.
Back to text

Exercise 5. Count Filetypes