Exercise 5.  Count Filetypes

This is a large document, but it is worth reading from A to Z.  You may consider printing it for later reference too.

The assignment

Write a procedure that lists the different filetypes found on an accessed disk or SFS directory, and also displays the frequency at which they are found.  The output we expect is thus something like this:

   There are 7 files with filetype EXEC on your B-disk
   There are 13 files with filetype XEDIT on your B-disk
   ...

The procedure should accept a filemode as parameter (so don't hard-code a filemode !).  We also explicitly ask you not to use CMS Pipelines but to search for the best solution using REXX logic (this is a very good exercise to practice the possibilities of REXX).

Test your procedure with a disk containing only a few files so that you can easily verify your results via a FILELIST.  Then, stress test it on a larger disk (e.g. the Y-disk).

General comments

Sorting with REXX

Several students sorted the filetypes.  Sometimes, this was to have a nicer output, but in other cases, the sort served the logic.  Indeed, by sorting the filetypes, it was easier to test for the different filetypes.  If you look for example at our solution, you'll see that the sorting is not needed to produce a correct result. 

It is a pity that REXX doesn't provide a SORT function.  Sorting can be done via external CMS functions, such as

CMS Pipelines is the easiest, and fast too.  You may remember from now on, that if you have, for example, to sort a REXX array, you can use:

  'PIPE STEM ARRAY.!SORT!STEM ARRAY.'

If, for any reason (e.g. portability) you want to sort using pure REXX functions, then you could use the SORT routine listed here (it uses the Quick Sort recursive algorithm).  For a very small number of elements, it's fast, but once the number augments, it becomes rapidly very CPU consuming. 

REXX Sort routine

/***********************************************************************/
/* Subroutine SORT                                                     */
/* Function : Sort a REXX array called ARRAY.                          */
/***********************************************************************/
SORT: procedure expose array. 
stack.1=1 ; stack.2=array.0 ; top=2
do until top=0
   last=stack.top ; top=top-1 ; first=stack.top ; top=top-1 ; i=first
   do until first>=last
      j=last ; k=trunc((first+last)/2) ; dividingline=array.k
      do until i>j
         do while array.idividingline ; j=j-1 ; end
         if i<=j then do
            temporary=array.i ; array.i=array.j ; array.j=temporary
            i=i+1 ; j=j-1
         end
      end /* do until i>j */
      if first=last */
end /* do until top=0 */
return

To use it, have the elements you want to sort in a REXX stem with name ARRAY. and then call sort.  On return, the array will be sorted.  Total CPU consumption is exponential, and may go from 0.02 seconds for 10 elements to 0.59 seconds for 650 elements and 10.2 seconds for 8950 elements.  Compiling triples the performance. 

One could also use XEDIT to sort.  It is fast too, surely faster than pure REXX for larger lists.  This is an example on how to use XEDIT' SORT:

/* This exec is a sample showing how one can use XEDIT to SORT */
address command
/*1 ---Get data to sort in STACK --------------------------------------*/
   'LISTFILE * EXEC A (STACK'          /* We want the filenames sorted */

/*2 ---Stack commands to make XEDIT place stacked data in file --------*/
   push 'INPUT'              /* tell XEDIT to place stack in file      */
   queue ''                  /* to stop XEDIT reading from stack       */
   push '-* DEL *'           /* but make file empty first              */

/*3 ---Stack command to SORT the data and STACK them afterwards -------*/
   queue '-* DMSXMS D 1 8'   /* Call XEDIT's SORT module on cols 1-8   */
   queue 'STACK *'           /* Ask XEDIT to stack the sorted data     */
   queue 'COMMAND QUIT'      /* And XEDIT can QUIT the file            */

/*4 ---Call XEDIT, it'll handle the stacked lines ---------------------*/
   'XEDIT DUMB FILE S (NOPROF NOMSG' /* use a dummy file               */

/*5 ---Here we have our sorted data -----------------------------------*/
   do queued()
      parse pull fid ; say fid
   end

This solution does no I/O's as the file is never written to disk !

File Identifiers

On VM/CMS, file identifiers cannot contain blanks, but on personal systems, this is possible.  When you parse the strings to find the filetypes (extensions to use the correct term), you can't work with word parsing, but need to use positional parsing. 

Test case

When you test your program, you need sometimes to provide a specific test case.  When we ran this course, we used a disk containing following files:

PROFILE  EXEC     T0
PROFILE  XEDIT    T1
T        EXEC     T1
+T       EXEC     T1
TEST     Document T2
NEW      DOCUMENT T2

Note the special character (+) in a filename, and a lowercase filetype. 

Benchmarks

At the end of this chapter, we provide our solutions and a benchmark that compares them with different other solutions.  You should review this as it will once more indicate that it is possible to drastically augment the performance of your procedures by using the good programming techniques. 

Different techniques can be used, such as:

  1. Use of work files (e.g. CMS EXEC) and sort of the input
  2. Use of pure REXX logic
  3. Use of CMS Pipelines. 

Grammar with REXX

Some students included extra logic in their procedures to make a distinction in the output messages when there is only one file with a certain filetype (singular) as opposed to the plural when there are more files with a filetype.  That's nice.  Their logic resembles to this :

   if nft.i=1 then say 'There is 1 file with filetype' ft.i 'on disk' fm
              else say 'There are' nft.i 'files with filetype' ft.i 'on disk' fm

This can be shorted if you use a programming trick we learned from John Hartmann (the creator of CMS Pipelines), although, it's complex programming style, we must admit:

   say 'There' word('are is',(nft.i=1)+1) left('files',4+(nft.i>1)),
       'with filetype' ft.i 'on disk' fm

The array nft. contains the number of files with a specific filetype ft.

This technique proves to be useful in other circumstances.  For example, when we want to calculate the number of days since Jan 1, 1900 and to take the leap years into account :

   /* Calculate days since Jan 1, 1900 */
   DAYS_SINCE_1900:
   Parse arg yy'/'mm'/'dd
   Days.01=  00; Days.02=  31; Days.03=  59; Days.04=  90
   Days.05= 120; Days.06= 151; Days.07= 181; Days.08= 212
   Days.09= 243; Days.10= 273; Days.11= 304; Days.12= 334
   now_to_1900 = (yy-1)*365 + (yy-1)%4 + days.mm + dd + (yy//4=0)*(mm>2)
   return now_to_1900

The last operand in the calculation adds 1 to the number of days if we're having a leap year (modulo 4) and if we're past the month February. 

Parsing

The exercise was meant as a combination of File I/O and parsing.  We'd like to come back to the solution we gave for the last exercise of lesson 2 (the MULTILIN procedure), and analyze this with you in detail now:
Having collected all one-lined records in an array rec., we issue following statements to split the records and to build the output array :

 nr = 0                                    /* number of single records */
 do i=1 to rec.0
    do while rec.i<>''
       parse value nr+1 rec.i with nr . 0 out.0 out.nr ';' rec.i
    end
 end

Let's simulate an iteration :

  1. nr is initialized to 0
  2. Then outer loop starts (we suppose there is at least one record). 
  3. Then inner loop starts as the record is surely not null the first time. 
  4. Then parse executes.  The source for the parse instruction is interpreted by REXX as follows:
        nr+1 rec.i
        0+1  statement1;statement2;statement3
        1    statement1;statement2;statement3
    
  5. The template is filled in as follows:
        nr  ==> 1
        .   ==> statement1;statement2;statement3
    

    nr has now the value 1, and then we backstep to the beginning of the source again, thus:

        out.0  ==> 1
        out.nr ==> out.1 ==> statement1
        rec.i  ==> rec.1 ==> statement2;statement3
    

    out.0 gets also value 1.  out.nr becomes variable out.1 as nr became 1 in previous steps.  out.1 is matched to statement1.  Similarly, as i is 1 for the first iteration, rec.1 gets the rest of the record after the first semicolon. 

  6. We go for another tour in the inner loop and the source becomes:
        nr+1 rec.i
        1+1  statement2;statement3
        2    statement2;statement3
    

    so that the template gives:

        nr     ==> 2
        .      ==> statemtent2;statement3
        out.0  ==> 2
        out.nr ==> out.2 ==> statement2
        rec.i  ==> rec.2 ==> statement3
    

    Both nr and out.0 have now value 2.  out.2 gets the next logical record and rec.1 is eaten further. 

What do we achieve by this technique ?  Well, this single parse statement allows us to:

  1. increment the record counter (nr)
  2. increment element 0 of our output array at the same rate (out.0).  It is used later for the output. 
  3. create the output array (out.)
  4. and eat the input records. 

First time we ran this course, one of our students used this technique in an even more elaborated technique when coding its solution.  It was Ivo Toelen of Kredietbank (now KBC).  We want to give him again the full praise he deserves, and to be honest, he taught your instructors to push the limits of the parse instruction even further.  We therefore repeat here the details of his solution, so that you can learn from it too. 

Ivo's code resumes to this (after having put the result of a LISTFILE in an array LST.) :

 1 !  TB.  = 0   /* Table of counters, indexed on filetype */
 2 !  TBN. = 0   /* Table of existing filetypes */
 3 !  do i = 1 to lst.0
 4 !     parse var lst.i . ft . 
 5 !     parse value TB.ft + 1 with new . 1 TB.ft . 
 6 !     if new = 1 then       /* add the new filetype to the TBN table */
 7 !        parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex .
 8 !  end
 9 !  trc=trace('r')
10 !  do i = 1 to TBN.0
11 !      say "There are" right(value('TB.'TBN.i),8),
   !          "files with type '"left(TBN.i,8)"' on filemode" fm
12 !  end

Suppose our disk contains just three files:

   PROFILE EXEC
   TEST EXEC
   LASTING GLOBALV

then, a trace of the procedure shows (we eliminated some irrelevant lines for editorial reasons):

     3 *-* do i = 1 to lst.0
     4 *-*  parse var lst.i . ft . 
       >>>    "EXEC"
     5 *-*  parse value TB.ft + 1 with new . 1 TB.ft . 
       >C>    "TB.EXEC"
       >V>    "0"
       >L>    "1"
       >O>    "1"
       >>>    "1"
       >.>    ""
       >C>    "TB.EXEC"
       >>>    "1"
     6 *-*  if new = 1 then /* add the new filetype to the TBN table */
       >O>    "1"
     7 *-*   parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex .
       >V>     "0"
       >L>     "1"
       >O>     "1"
       >V>     "EXEC"
       >O>     "1 EXEC"
       >>>     "1"
       >.>     "EXEC"
       >>>     "1"
       >C>     "TBN.1"
       >>>     "EXEC"
     8 *-* end
     3 *-* do i = 1 to lst.0
     4 *-*  parse var lst.i . ft . 
       >>>    "GLOBALV"
     5 *-*  parse value TB.ft + 1 with new . 1 TB.ft . 
       >C>    "TB.GLOBALV"
       >V>    "0"
       >L>    "1"
       >O>    "1"
       >>>    "1"
       >C>    "TB.GLOBALV"
       >>>    "1"
     6 *-*  if new = 1 then /* add the new filetype to the TBN table */
       >O>    "1"
     7 *-*   parse value TBN.0 +1 ft with newindex . 1 TBN.0 TBN.newindex . 
       >V>     "1"
       >L>     "1"
       >O>     "2"
       >V>     "GLOBALV"
       >O>     "2 GLOBALV"
       >>>     "2"
       >.>     "GLOBALV"
       >>>     "2"
       >C>     "TBN.2"
       >>>     "GLOBALV"
     8 *-* end
     3 *-* do i = 1 to lst.0
     4 *-*  parse var lst.i . ft . 
       >>>    "EXEC"
     5 *-*  parse value TB.ft + 1 with new . 1 TB.ft . 
       >C>    "TB.EXEC"
       >V>    "1"
       >L>    "1"
       >O>    "2"
       >>>    "2"
       >C>    "TB.EXEC"
       >>>    "2"
     6 *-*  if new = 1
       >O>    "0"
     8 *-* end
     3 *-* do i = 1 to lst.0
     9 *-* trc=trace('r')
    10 *-* do i = 1 to TBN.0
    11 *-*  say "There are" right(value('TB.'TBN.i),8) "files with typ
e '"left(TBN.i,8)"' on filemode" fm
There are        2 files with type 'EXEC    ' on filemode T
    12 *-* end
    ... 

After having read the file-ids in a REXX stem and initialized two REXX stems TB. and TBN. to 0, the file-ids are processed one by one. 

The first filetype is EXEC.  Let's analyze statement 5 in detail. 

>C> "TB.EXEC"as ft='EXEC', TB.ft becomes TB.EXEC
>V> "0"as the array was initialized to default 0, TB.EXEC gets a value of 0.
>L> "1"Then we add literal value 1
>O> "1"to become "1" as the source for the parse.
>>> "1"As a result, new gets the value 1.
>.> "" dot as placeholder to remove leading and trailing blanks (if any)
>C> "TB.EXEC"and after backstepping in the template,
>>> "1"element EXEC of array TB. is also set to value 1

Thus, new and TB.EXEC both have value 1.  The test at line 6 is thus true, and we come to analyze line 7:

>V> "0"TBN.0 has indeed value 0 due to initialization of the array.
>O> "1"so, adding 1 gives 1 as output
>V> "EXEC"and as the filetype is EXEC
>O> "1 EXEC"the source for the parse becomes "1 EXEC".
>>> "1"Thus, newindex gets the first word, (1)
>.> "EXEC"the placeholder gets the rest
>>> "1"TBN.0 gets also value 1 due to backstepping.
>C> "TBN.1"and, as newindex just became 1, TBN.1 is a new array element, and...
>>> "EXEC"it gets the second word of the source (EXEC).

So, for each new filetype, a new element is added to the TBN. array, and the TBN.0 element increments too so that we know how many elements the array contains. 

When the second file is analyzed (filetype GLOBALV) the first parse does exactly the same as above for the EXEC filetype (as it is the first time we encounter this filetype, a new TBN. element is created).  The second parse then gives:

>V> "1"TBN.0 had value 1 after first loop, thus...
>O> "2"the sum is now 2.
>O> "2 GLOBALV"Our source string becomes thus "2 GLOBALV"
>>> "2"newindex gets value 2, and
>>> "GLOBALV"TBN.2 gets value GLOBALV.

We had two different filetypes till now, and so we have two elements in our TBN. array.  The third file is a filetype EXEC again.  The first parse sets both new and TB.EXEC to a value of 2 (which is indeed the number of EXEC filetypes we encountered).  Consequently, the if condition at line 6 is false and no new element is added to the TBN. array. 

The first loop ends, and we are ready to analyze the second one.  It produces the output at the terminal.  The only tricky thing here is the value() function.  These are the intermediate interpretation steps in the first iteration:

             value('TB.'TBN.i)
  becomes    value('TB.'"EXEC")   as i=1 and TBN.1="EXEC"
  becomes    value(TB.EXEC)       after concatation of the strings
  becomes    2                    as TB.EXEC is 2

Note the quotes around TB. in the value() function.  This is needed, as otherwise REXX would try to find the value of the variable TB.TBN.1


Some typical solutions.

As this is definitely a key exercise of the telecourse, we add comments to several solutions we got from our former students. 

   ! /*would normally contain author info etc */
 1 ! Address COMMAND
 2 ! 'MAKEBUF'
 3 ! parse upper arg fm .
 4 ! types = ''
 5 ! if fm = '' then fm = 'A'
 6 ! 'LISTFILE * *' fm '(STACK'
 7 ! if rc ¬= 0 then exit rc
 8 ! x=0
 9 ! do queued()
10 !  parse pull . type .
11 !  if wordpos( type, types) = 0 then types=types type
12 !  x=x+1
13 !  ftype.x = type
14 ! end
15 ! 'DROPBUF'
16 ! do loop = 1 to words(types)
17 !    count = 0
18 !    parse var types thistype types
19 !    do z = 1 to x
20 !       if thistype = ftype.z  then
21 !          count = count+1
22 !    end
23 !    say 'There are 'count' file(s) of type 'thistype' on your 'fm'-Disk'
24 ! end
25 ! return

This procedure produces a correct output, but will be one of the worst performers in our benchmark later on.  This is certainly due to the heavy use of the REXX functions word() and wordpos() and the general logic.  At each iteration in the second loop, the number of words in the string has to be re-calculated.  Furthermore, a large array is built in order to remember the filetypes and to be able to count them later in a double-loop.  Our solutions show better techniques not that different from this after all.  It means that some further analysis can drastically improve the performance of a procedure.

And, if a general exit routine had been used, it would become easier to give feedback that is more complete (e.g. for line 7).

   ! /************************************************************/
   ! /* The Filename is QTYPES                            ********/
   ! /* Function     : Count FILETYPES                    ********/
   ! /************************************************************/
 1 ! parse upper arg fm .
 2 ! if fm = '' then fm = 'A'
 3 ! queue 'FT FILELIST'
 4 ! queue 'FN TEMP'
 5 ! queue 'SORT * 16 23'
 6 ! queue 'FFILE'
 7 ! address command 'EXEC FILELIST * * 'fm
 8 ! address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST'
 9 ! rcexecio = rc
10 ! parse var flist . fti .
11 ! do while rcexecio = 0
12 !    i = 1
13 !    address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST'
14 !    rcexecio = rc
15 !    parse var flist . fto .
16 !    do while fti = fto & rcexecio = 0
17 !       address command 'EXECIO 1 DISKR TEMP FILELIST A (VAR FLIST'
18 !       rcexecio = rc
19 !       parse var flist . fti .
20 !       i = i + 1
21 !    end;
22 !    if i > 1 then say i' files with filetype 'fto' on 'fm'-DISK.'
23 !             else say '1 file with filetype 'fti' on 'fm'-DISK.'
24 !    if i = 1 then fti = fto;
25 ! end
26 ! address command 'ERASE TEMP FILELIST A'

This procedure produces incorrect results, but has a clean programming style.

These are our comments:

  1. Instead of writing explicitly an address command for each CMS or CP command, it would be easier to code it at the beginning of the procedure, isn't it ?  REXX needs CPU cycles to interpret these too, several times in the loops.
  2. Lines 3 to 8. This student seems to have forgotten about the LISTFILE command that he takes the long detour via the FILELIST command.  Worse is that there is a lot of extra overhead as:

    The only advantage of this method is that there is a sorted output which simplifies the subsequent logic.

    There is yet another problem due to the FILELIST as opposed to the LISTFILE.  The FILELIST lists also any SFS sub-directories.  Due to this, we get for example an extra result like this:

      1 file with filetype N on N-disk
    

    This is because a directory has no filetype, and the filemode is taken instead.  This is why the output is incorrect.

  3. Lines 3, 4 and 6 could be combined into one single statement:
      queue 'FFILE TEMP FILELIST'
    

    or even better:

      queue 'COMMAND FILE TEMP FILELIST'
    

    FFILE is indeed the synonym for COMMAND FILE, while FILE is the synonym for COMMAND PFILE (protected file), just as QQUIT is a synonym for COMMAND QUIT and QUIT is synonym for COMMAND PQUIT (protected quit).

  4. If one of the EXECIO statements has a problem, this is not shown to the user.  Note however that the end-of-file condition is indicated by a return code 2, which can be considered normal.
  5. Lines 22, 23. Nice to make distinction between singular and plural, but we have demonstrated other alternatives.
   ! /**********************************************************************/
   ! /* QTYPES - Created by Rudi                                           */
   ! /* Function : count the different filetypes on an accessed disk       */
   ! /* Parameters : filemode                                              */
   ! /**********************************************************************/
 1 ! address command
 2 ! parse source . . myname .
 3 ! parse upper arg fm
 4 ! if fm='' then fm='A'
 5 ! address '' 'ESTATE * * 'fm
 6 ! if RC¬=0 then call Error_Exit rc,' 'fm' is not accessed'
 7 ! address CMS 'LISTFILE * * 'fm' (exec'
 8 ! if RC¬=0 then call Error_Exit rc,' No A disk accessed in R/W'
 9 ! set cmstype ht
10 ! queue '16 25'
11 ! address '' 'SORT CMS EXEC A CMS TEMP A'
12 ! address '' 'ERASE CMS EXEC A '
13 ! set cmstype rt
14 ! 'EXECIO * DISKR CMS TEMP A (FINIS margins 17 25 STEM FT.'
15 ! x=0;ftc.x=0
16 ! do i=1 to ft.0 by 1
17 !    if ft.i=fto then do             /* did we have already such FT ? */
18 !       ftc.x=ftc.x+1                /* increment counter  */
19 !    end
20 !    else do                         /* New FT */
21 !       x=x+1                        /* prepare next values */
22 !       ftc.x=0                      /* idem */
23 !       ftc.x=ftc.x+1                /* increment new counter */
24 !       ft.x=ft.i                    /* set new FT */
25 !    end
26 !    fto=ft.i
27 ! end
28 ! do y=1 to x by 1
29 !   say 'you have 'ftc.y' files with filetype 'ft.y' on the disk accessed',
   !      'as 'fm
30 ! end
31 ! say 'Total : 'ft.0' files '
32 ! say 'Sorry, no PARSE used, where could i have used it ????'
33 ! say 'What did i miss ???, could it perform better ??'
34 ! address '' 'ERASE CMS TEMP A'
35 ! call Error_Exit 00,'All done'
   ! /****************** Exit routine ********************************/
36 ! Error_Exit:
37 !  parse arg retc,errmsg
38 !  if errmsg<>'' then say myname':' errmsg
39 !  exit retc

This solution produces correct output, is clean but has a rather complex programming style.

  1. Line 32,33. When we will discuss our solutions, it will become clear that it is indeed possible to make it perform better...
  2. As there is an address command at the beginning of the procedure, it is exaggerated to prefix the CMS commands with an extra address '' !
  3. Line 5. This is a clever way to test if a filemode is accessed or not.  The stack is avoided, only a return code must be tested.  And, we're glad to see someone using the general exit routine...
  4. Line 7. Why is this prefixed with an address CMS this time ?  If it is to see the error message from the LISTFILE command, then you should use (see Lesson 1):
       [address 'COMMAND'] 'CMDCALL LISTFILE ...'
    

    But, the following statement suggests that you want to intercept the case where the A-disk is not accessed in R/W (rare, but can happen).  By using ESTATEW, the same simple technique as with line 5 could be used (note the extra W to test for R/W access).

    Anyway, this discussion wouldn't matter if the stack would have been used instead of a temporary file (STACK option).  The temporary file allows sorting the output (line 11), but a second output file is needed for the sort.

  5. Lines 9 and 13. These are CMS commands... should be between quotes and uppercased.  And, line 13 forces CMSTYPE at HT, independently of its prior setting.  To be able to run our benchmark without producing all the output on the screen, we changed this as follows:
       cmstype=cmsflag('CMSTYPE')            /* returns 0 if HT, else 1 */
       if cmstype then 'SET CMSTYPE HT'
          /* do your commands */
       if cmstype then 'SET CMSTYPE RT'
    
  6. Line 17. Why this then do when there is only one statement ?
  7. Lines 16 to 27. We won't extend on the logic.  It works, but we suggest you look at the benchmarks at the end of this chapter to see how far performance can be improved...
   ! /* ========================================================== */
   ! /* QTYPES   - created by: Fred Flinstone                      */
   ! /* Function: list # of files with filetypes                   */
   ! /* Parameters: filemode (default: A)                          */
   ! /* Options: none                                              */
   ! /* ========================================================== */
 1 ! address command
 2 ! parse upper arg fm .
 3 ! if fm = '' then fm = 'A'
 4 ! 'ERASE CMS EXEC A'
 5 ! 'ERASE $$CMS$$ $$EXEC$$ A'
 6 ! 'LISTFILE * * ' fm '(FN FT EXEC'
 7 ! if rc¬=0 then do
 8 !    say 'disk' fm 'not currently accessed'
 9 !    exit 1
10 ! end
11 ! queue '17 24'
12 ! 'PIPE COMMAND SORT CMS EXEC A $$CMS$$ $$EXEC$$ A'
13 ! parse value linein($$CMS$$ $$EXEC$$ A) 1 with . . . ftk nfk .
14 ! do until lines($$CMS$$ $$EXEC$$ A) = 1
15 !    parse value linein($$CMS$$ $$EXEC$$ A) with . . . ft .
16 !    if ft = ftk then nfk = nfk + 1
17 !    else do
18 !         if nfk = 1 then
19 !            say 'there is 1 file with filetype' ftk 'on disk' fm
20 !         else
21 !            say 'there are' nfk 'files with filetype' ftk 'on disk' fm
22 !         parse value 1   ft  ,
23 !               with  nfk ftk
24 !    end
25 ! end
26 ! exit

This procedure produces incorrect output and is a very bad performer.

We appreciate that at least one student tried to use the linein() function for the method.  Lesson 4 will show that this is not the best performer, but at least we have an extra example now.  It also partly explains why it is a bad performer (see benchmark at the end of the chapter).  We have to blame the authors of LINEIN, not the student.  The benchmark will also show why it is not good to work with a temporary file.

Other comments:

  1. Lines 11 and 12. Hmmm, Rome is a nice city, worth visiting, but do you have to make this detour here ?  Using PIPE to suppress the Enter Sort Fields message is twice as expensive as a SET CMSTYPE HT/RT combination. 
  2. Line 14. Here is a problem with the logic (that's why the output is incorrect).  The do until lines()=1 makes that the last record is not processed.  The test should have been for lines()=0 !
  3. Lines 17 to 23. We demonstrated shorter techniques for this, such as the "John Hartmann" technique.
  4. The file is not closed at the end of the procedure.  We will come back on this aspect in lesson 4, but try to remember from now on that the files your procedure opens should be closed before leaving the procedure.
   ! /*******************************************************************/
   ! /* QTYPES   - Created by Valerie                                   */
   ! /*******************************************************************/
 1 ! address command
 2 ! arg fmode .
 3 ! 'MAKEBUF'
 4 ! 'LISTFILE * *' fmode '(STACK FT'
 5 ! if rc=36 then exit
 6 ! x=queued()
 7 ! if fmode='' then fmode='A'
 8 ! parse value '  ' with f t
 9 ! do i=1 to x
10 !    pull fname stem.i
11 !    f=f stem.i
12 ! end
13 ! do i=1 to x
14 !    parse value '0 1' with r j
15 !    if wordpos(stem.i,t)¬=0 then iterate
16 !    do until y=0
17 !       y=wordpos(stem.i,f,j)
18 !       if y¬=0 then do
19 !          r=r+1
20 !          j=y+1
21 !       end
22 !       if y=0 & r>1 then t=t stem.i
23 !    end
24 !    say 'There are' r 'files with filetype' stem.i 'on your' fmode'-disk'
25 ! end
26 ! 'DROPBUF'
27 ! exit

This solution produces incorrect output and is a bad performer, but shows a clean programming style.

Our comments:

  1. Let's first analyze why the output is incorrect.  There are two files with filetype DOCUMENT on the test disk, but one of these has the filetype with lowercase characters.  At line 10 the PULL instruction is used.  This is equivalent to parse upper pull and thus, the filetypes are translated to uppercase.  We personally avoid PULL and ARG, to avoid situations like this.
  2. We have some difficulties with the choice of variables.  These single-letter variables make it difficult to analyze the logic.
  3. The main reason for the bad performance is because all the filetypes are accumulated in one large string and further on processed with the CPU consuming wordpos() function to look up the filetypes again.  Please look at our solutions below to learn how to improve this.
  4. The MAKEBUF/DROPBUF combination is not used correctly here.  Lesson 4 will learn you why.
  5. Line 22. We would have written:
                else
             if r>1 then...
    

    as this allows to follow the logic more easily.

   ! /*******************************************************************/
   ! /* QTYPES   - Created by Wim                                       */
   ! /* Function : Count filetypes                                      */
   ! /* Parameters : filemode                                           */
   ! /*******************************************************************/
 1 ! address command
 2 ! parse upper source . . myname mytype . syn.
 3 ! parse upper arg fm .
 4 ! if fm = '' then fm = 'A'
 5 ! 'DROPBUF'
 6 ! 'LISTFILE * *' fm '(NOHEADER STACK' /* listfile of the filemode fm */
 7 ! if rc <> 0 then call errexit rc,'Problem with listfile * *' fm
 8 ! ft_array = ''
 9 ! aantal_ft. = 0
10 ! do queued()
11 !   parse pull . ft .                            /* get the filetype */
12 !   ft_pos = wordpos(ft,ft_array)     /* is it already in the list ? */
13 !   if ft_pos = 0 then do                            /* not it isn't */
14 !     ft_array = ft_array ft                /* so add it to the list */
15 !     ft_pos = words(ft_array) /* what is the position in the list ? */
16 !   end
   !                               /* increment element in array with 1 */
17 !   aantal_ft.ft_pos = aantal_ft.ft_pos + 1
18 ! end
19 ! do i=1 to words(ft_array)                          /* print result */
20 !   if aantal_ft.i = 1 then
21 !     say 'there is 1 file with filetype' word(ft_array,i) 'on your',
22 !        fm'-disk.'
23 !   else
24 !     say 'there are' aantal_ft.i 'files with filetype' word(ft_array,i),
25 !        'on your' fm'-disk.'
26 ! end
27 ! exit
   !
28 ! ERREXIT: /* exit with errormessage(s) */
29 !  do n=2 to arg()                  /* get all errormessages to show */
30 !    say myname ':' arg(n)                  /* give the n'th message */
31 !  end n
32 ! exit arg(1)

This program produces correct output, has a clean programming style and has an acceptable performance.

Our comments:

  1. Good use of the general exit routine.
  2. Line 5. Why this DROPBUF ?  If this procedure were called by another procedure, this would kill anything the caller has stacked.  See lesson 4 for correct use of the MAKEBUF/DROPBUF combination.
  3. Lines 12 and 13 could be combined so that there is no new variable to be created:
       if wordpos(ft,ft_array)=0 then do...
    
  4. Nice to have made the distinction between singular and plural.

Our solutions and benchmark.

Let's now review our solutions :

The very first try.

/*******************************************************************/
/* Procedure QTYPES1  List filetype occurrences on any disk        */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */
oldqueue=queued() ; 'MAKEBUF' ; buffer=rc     /* make stack buffer */
filetypes=''                                  /* initialize string */
'LISTFILE * * 'mode' (STACK'                  /* issue the command */
if rc<>0 then call errexit rc,'Error reading filelist of disk 'mode
total=queued()-oldqueue                   /* total number of files */
do queued()-oldqueue
   parse pull . ft .                    /* get filetype from stack */
   if wordpos(ft,filetypes)<>0 then nbr.ft=nbr.ft+1   /* increment */
                               else do         /* if not yet found */
      filetypes=filetypes ft                      /* add to string */
      nbr.ft=1                               /* initialize counter */
   end
end
say 'From a total of 'total' files on disk' mode '...'
l=length(total)
do i=1 to words(filetypes)                       /* perform output */
   ft=word(filetypes,i)
   say right(nbr.ft,l) 'have a type' ft
end
call exit 0
/* Subroutines ** you know our ERREXIT routine by now isn't it ? ***/

Our first solution uses the technique whereby we build a string that contains all different filetypes, but instead of using an array with numeric indices (1 to n) for counting the occurrences of filetypes, we build an array whereby the elements (the tails of the stem) are the filetypes themselves.

This means, we build an array where the elements could be nbr.MODULE, nbr.EXEC, nbr.XEDIT, nbr.SCRIPT, etc., and these will contain the number of times a filetype is present on the disk.

The main advantage of this technique is that, while counting, we don't have to look up the filetype in the filetypes string.  When we analyze a filetype, its name is the index in our array immediately. 

We have however still to build up the filetypes string in order to be able to run the second loop producing the output. 

You should note that REXX accepts anything as index (tail) for a stem, even a string containing blanks or special characters !  In addition, this can give problems.  If we had coded the first loop like this:

do queued()-oldqueue
   parse pull 10 ft +8                      /* get filetype from stack */
   if wordpos(ft,filetypes)<>0 then nbr.ft=nbr.ft+1       /* increment */
                               else do             /* if not yet found */
      filetypes=filetypes ft                          /* add to string */
      nbr.ft=1                                   /* initialize counter */
   end
end

then, the second loop wouldn't find back the correct indices anymore, as the parse will create filetypes with a length of 8 characters (so, with trailing blanks).  In that case we would need to add a ft=left(ft,8) in the second loop to make it a string of exactly 8 characters.

Improving performance

A minor change to the code can still improve performance.

Take a close look at this statement from the first loop:

  if wordpos(ft,filetypes)¬=0 ...

For each filetype, REXX has to compare it to each word in the variable filetypes ; when this list grows, it can cost quite a lot of CPU when REXX has to scan it sequentially.

If you accept that REXX uses a much faster binary search algorithm when it has to find one of its variables, then it is possible to start looking for improvements.

nbr.=0                                         /* Initialize array */
do queued()-oldqueue
   parse pull . ft .
   if nbr.ft=0 then filetypes=filetypes ft /* add new file to list */
   nbr.ft=nbr.ft + 1                       /* increment counter    */
end

Instead of looking up the large filetypes string, we initialize the array before the loop(footnote 2), and we simply test whether the element of the array is greater than zero or not. 

For example, the very first time a filetype LISTING is encountered, the element nbr.LISTING will be 0.  So, we then assign it a value 1.  For next occurrences, the array element will not be 0 anymore, and we just add 1 to the counter but don't add the filetype to the list anymore. 

We still need the filetypes string for our second loop. 

But, this second loop is also negatively impacted by the use of the words() and word() functions.  The words() function forces REXX to scan the whole string at each iteration, while for word(), it depends ; obviously, the first word is found immediately, but to find the thousandth word, REXX has again to scan for 999 blanks...(footnote 3)

So, let's replace it by the parsing the string while eating it technique that we learned in the lesson.  We implemented this in our second solution, and, as you can see, the procedure did not loose readability due to our performance improvements.  It gave a gain of 36% for the test run on the Y-disk. 

Please, don't conclude from this that you shouldn't use the REXX functions at all.  Remember, in order of priority:

  1. For a REXX beginner: be satisfied if your exec runs as expected. 
  2. Use REXX functions, especially if it allows avoiding do-loops. 
  3. Use special techniques when you need performance.

Here comes our second solution:

/*******************************************************************/
/* Procedure QTYPES2  List filetype occurrences on any disk        */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */
oldqueue=queued() ; 'MAKEBUF' ; buffer=rc     /* make stack buffer */
filetypes=''                                  /* initialize string */
nbr.=0                                            /* ftype counter */
'LISTFILE * * 'mode' (STACK'                  /* issue the command */
if rc<>0 then call errexit rc,'Error reading filelist of disk 'mode
total=queued()-oldqueue                   /* total number of files */
do queued()-oldqueue
   parse pull . ft .                    /* get filetype from stack */
   if nbr.ft=0 then filetypes=filetypes ft      /* append new file */
   nbr.ft=nbr.ft+1                            /* increment counter */
end
say 'From a total of 'total' files on disk' mode '...'
l=length(total)
do while filetypes^=''                           /* perform output */
   parse var filetypes ft filetypes                /* eat our list */
   say right(nbr.ft,l) 'have a type' ft
end
call exit 0
/* Subroutines ** you know our ERREXIT routine by now isn't it ? ***/

Another improvement we applied to this classic REXX coding has to deal with storage management.  With all operating systems, storage management is somehow costly.  On VM systems, we know REXX does the following:

So, in our procedure, we have the variable filetypes that grows continuously.  We can help REXX if we first pre-allocate a large variable that is filled in afterwards.  The gains are measurable both for the interpreter as for the compiler, but of course depends on the number of times the variable grows in the loop. 

Here is our third solution where storage pre-allocation is used:

/*******************************************************************/
/* Procedure QTYPES25 : Using storage pre-allocation.              */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */
oldqueue=queued() ; 'MAKEBUF' ; bufno=rc      /* make stack buffer */
filetypes=left('',1000)                       /* initialize string */
filetypes=''                                  /* initialize string */
nbr.=0                                            /* ftype counter */
'LISTFILE * * 'mode' (STACK'                  /* issue the command */
if rc<>0 then call emsg rc,'Error reading filelist of disk 'mode
total=queued()-oldqueue                   /* total number of files */
do queued()-oldqueue
   parse pull . ft .                    /* get filetype from stack */
   if nbr.ft=0 then filetypes=filetypes ft      /* append new file */
   nbr.ft=nbr.ft+1                            /* increment counter */
end
say 'From a total of 'total' files on disk' mode '...'
l=length(total)
do while filetypes^=''                           /* perform output */
   parse var filetypes ft filetypes                /* eat our list */
   say right(nbr.ft,l) 'have a type' ft
end
call exit 0

We couldn't resist searching for a solution using CMS Pipelines.  This solution, allows us to produce a sorted output.  To make a distinction between singular and plural is not so easy however, and therefore not included in this pipeline. 

/*******************************************************************/
/* Procedure QTYPES3  Solution using PIPE                          */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */

 'PIPE COMMAND LISTFILE * *' mode '(FT HEADER',
    '!VAR MSG',                        /* Header or Errmsg to REXX */
    '!DROP 1',                            /* drop Header or Errmsg */
    '!SPEC Word 2 1',                           /* take ftype only */
    '!SORT 1.8',                                  /* sort on Ftype */
    '!UNIQUE COUNT LAST',                      /* now count Ftypes */
    '!SPEC 1.10 2 /have a type/ Nextword 11-* Nextword',
    '!Literal On disk' mode '...',               /* add headerline */
    '!CONSOLE'

if rc^=0 then Say msg                  /* show LISTFILE's errormsg */
exit rc

We could even go a bit further and sort the output a second time to list the filetypes by usage-frequency. 

/*******************************************************************/
/* Procedure QTYPES35 : Using PIPE and sorting twice.              */
/*******************************************************************/
address command
parse upper arg mode .                           /* get parameters */
if mode='' then mode='A'                          /* defaults to A */

 'PIPE COMMAND LISTFILE * *' mode '(FT HEADER',
    '!VAR MSG',                        /* Header or Errmsg to REXX */
    '!DROP 1',                            /* drop Header or Errmsg */
    '!SPEC Word 2 1',                           /* take ftype only */
    '!SORT 1.8',                                  /* sort on Ftype */
    '!UNIQUE COUNT LAST',                      /* now count Ftypes */
    '!SORT 1.10 D',                    /* sort most occuring first */
    '!SPEC 1.10 2 /have a type/ Nextword 11-* Nextword',
    '!Literal On disk' mode '...',               /* add headerline */
    '!CONSOLE'

if rc^=0 then Say msg                  /* show LISTFILE's errormsg */
exit rc

Finally, we were interested in analyzing the performance of all the different solutions.  We used the BENCH goodie as described in Appendix C.  We ran the procedures four times for the goodies disk containing 685 files at that time.

Note: We had to modify slightly the procedures so that:

We added results from several other solutions, so that we have an even more representative set. 

We confess that we sometimes don't compare the same things.  Some students made for example a distinction between unique filetypes and multiple filetypes and adapted the output message to the singular or plural version.  We think however that this counts only for a minimal overhead as it runs in the second loop with a limited number of iterations. 

The first column gives the ranking.  It is based on the Service Time.  This is a calculated time being the sum of the CPU time and the I/O time.  For the I/O time, we count with an average service time of 25 milliseconds. 

The last column (before the comments) contains the true Elapsed Time.  This time is influenced by the actual performance of the system but as we did the runs one after the other, the variation due to the system load (it was lightly loaded) is minimal.  It may surprise you is that for some runs, the Elapsed Time is shorter than the Service Time.  This clearly shows what minidisk caching (the I/O Service Time is not 20 ms) can bring to 'bad applications'.  If there were no minidisk caching, then the elapsed time would be very near to the calculated service time. 

Both for the CPU as for the Service Time, we included a percentage comparing the solutions to the worst performer (which has of course 100%). 

Rank! R ! Tot CPU  !SIO    IO  ! Servtime !Elapsed !Command     Method
    ! U ! time  %  !      time !  time  % !   time !
    ! N !(Sec)     !nbr (Secs) !(Secs)    ! (Secs) !
----+---+----------+-----------+----------+--------+---------------------------
 32 !29 ! 15.48 100!   0   0.00! 15.48  52! 16.77  !ANDY     REXX Wordpos
 31 !30 !  5.33  34!   0   0.00!  5.33  18!  6.05  !VALERIE  REXX Wordpos
 30 ! 5 !  4.79  31! 991  24.78! 29.57 100!  8.05  !FRED     Workfile & Sort
 29 !25 !  2.75  18!  92   2.30!  5.05  17!  4.11  !X94204   Workfile & XEDIT so
 28 ! 1 !  2.15  14!  90   2.25!  4.40  15!  4.23  !PHILIPPE Filelist
 27 ! 2 !  1.76  11!  94   2.35!  4.11  14!  4.75  !X94106   Filelist
 26 ! 6 !  1.53  10!  49   1.23!  2.76   9!  4.27  !X94110   Workfile
 25 !20 !  1.44   9!1042  26.05! 27.49  93!  2.71  !X94101   Workfile & Sort
 24 ! 3 !  1.44   9!  82   2.05!  3.49  12!  4.86  !X94113   Filelist
 23 ! 4 !  1.32   9! 998  24.95! 26.27  89!  8.24  !RUDI     Workfile & Sort
 22 ! 7 !  1.32   9!   0   0.00!  1.32   4!  2.49  !X94114   REXX Wordpos
 21 !21 !  1.28   8!  64   1.60!  2.88  10!  6.53  !X94210   Filelist
 20 !26 !  1.27   8!1010  25.25! 26.52  90!  2.41  !X94209   Workfile & Sort
 19 ! 8 !  1.26   8!   0   0.00!  1.26   4!  1.94  !X94108   REXX Wordpos
 18 !O1 !  1.20   8!   0   0.00!  1.20   4!  2.10  !QTYPES1  REXX Wordpos
 17 !27 !  1.14   7!   0   0.00!  1.14   4!  1.36  !X94214   REXX Wordpos
 16 !32 !  1.00   6!  44   1.10!  2.10   7!  1.23  !X95112   Stack + XEDIT Sort
 15 !31 !  0.99   6!   0   0.00!  0.99   3!  1.03  !WIM      REXX Wordpos
 14 !28 !  0.96   6!   0   0.00!  0.96   3!  1.03  !X95113   REXX Wordpos
 13 ! 9 !  0.93   6!   0   0.00!  0.93   3!  1.54  !X  (Ivo) REXX Stems
 12 !22 !  0.88   6!   0   0.00!  0.88   3!  0.95  !X94213   REXX Wordpos
 11 !24 !  0.88   6!   0   0.00!  0.88   3!  0.99  !X94207   REXX Wordpos
 10 !O2 !  0.77   5!   0   0.00!  0.77   3!  1.79  !QTYPES2  REXX Stems
  9 !O3 !  0.76   5!   0   0.00!  0.76   3!  1.19  !QTYPES25 REXX Stems+storage
  8 !10 !  0.61   4!   0   0.00!  0.61   2!  1.37  !X94102   Pipe
  7 !23 !  0.52   3!   0   0.00!  0.52   2!  0.59  !X94217   REXX Stems
  6 !14 !  0.48   3!   0   0.00!  0.48   2!  0.99  !X94107   Pipe
  5 !12 !  0.46   3!   0   0.00!  0.46   2!  2.26  !X94104   Pipe
  4 !11 !  0.44   3!   0   0.00!  0.44   1!  1.24  !X94112   Pipe
  3 !13 !  0.44   3!   0   0.00!  0.44   1!  0.86  !X94105   Pipe
  2 !O5 !  0.43   3!   0   0.00!  0.43   1!  0.84  !QTYPES35 Pipe 2 sorts
  1 !O4 !  0.42   3!   0   0.00!  0.42   1!  0.62  !QTYPES3  Pipe

What are the main conclusions ?

  1. The ratio between the best and the worst service time is 100.  This means that if all programs and procedures would use the worst programming techniques, you would need a computer with 100 times more MIPS ! Don't think these figures are exceptional.  We can consider this as a reflection of the real world.  The average of these results is 25%.  This means that the best performer is still 25 times better than the average.  We know about VM systems where a bit of investment, the REXX procedures could easily be optimized by a factor of 30 or more.  This proves that the investment in learning the good programming techniques is worth its money too. 
  2. Among the worst performers, we find of course the solutions using work files (this augments the service time).  Some even need two work files (due to the CMS SORT, which isn't very fast either).  Fred was also extra penalized by the use of the poor performing LINEIN() function.  But, we if we look at the CPU consumption then we see poor performance for those procedures that extensively use the REXX functions. 
  3. On the other side of the picture, this benchmark proves again the tremendous power of CMS Pipelines.  It's not only fast in execution, but also in development (the solutions using PIPE are the shortest). 
  4. We're glad our QTYPES3 is the best performer, ouf !
  5. But if we analyze more closely our different solutions, we see that measurable gains are still possible by excluding all CPU consumers like wordpos() and word()

If you need some publicity for the REXX compiler, look at next table where the procedures are compiled:

Rank! R ! TCPU Gain-!SIO    IO  ! Servt !Elapsed !Command     Method
    ! U ! time   ed !      time !  time !   time !
    ! N !(Sec)   %  !nbr (Secs) !(Secs) ! (Secs) !
----+---+-----------+-----------+-------+--------+---------------------------
 17 ! 1 !  2.15  27 !  90   2.25!  4.40 !  4.23  !X94109      Filelist
 16 ! 2 !  1.76  44 !  94   2.35!  4.11 !  4.75  !X94106      Filelist
 15 ! 3 !  1.44  40 !  82   2.05!  3.49 !  4.86  !X94113      Filelist
 18 ! 4 !  1.32  36 ! 998  24.95! 26.27 !  8.24  !X94111      Workfile & SORT
 19 ! 5 !  4.79  65 !1042  24.78! 29.67 !  8.05  !X94101      Workfile & SORT
 14 ! 6 !  1.53  70 !  49   1.23!  2.76 !  4.27  !X94110      Workfile
 13 ! 7 !  1.32  58 !   0   0.00!  1.32 !  2.49  !X94114      Rexx Wordpos
 12 ! 8 !  1.26  56 !   0   0.00!  1.26 !  1.94  !X94108      Rexx Wordpos
 10 ! 9 !  0.93  41 !   0   0.00!  0.93 !  1.54  !X  (Ivo)    Rexx stems
  7 !10 !  0.61   2 !   0   0.00!  0.61 !  1.37  !X94102      Pipe
  3 !11 !  0.44   5 !   0   0.00!  0.44 !  1.24  !X94112      Pipe
  5 !12 !  0.46   9 !   0   0.00!  0.46 !  2.26  !X94104      Pipe
  4 !13 !  0.44  11 !   0   0.00!  0.44 !  0.86  !X94105      Pipe
  6 !14 !  0.48   8 !   0   0.00!  0.48 !  0.99  !X94107      Pipe
 11 !15 !  1.20  55 !   0   0.00!  1.20 !  2.10  !QTYPES1     Rexx Wordpos
  9 !16 !  0.77  36 !   0   0.00!  0.77 !  1.79  !QTYPES2     Rexx stems
  8 !17 !  0.76  37 !   0   0.00!  0.76 !  1.19  !QTYPES25    Rexx stems
  2 !18 !  0.43   2 !   0   0.00!  0.43 !  0.62  !QTYPES3     Pipe
  1 !19 !  0.42   0 !   0   0.00!  0.42 !  0.84  !QTYPES35    Pipe

We compiled the solutions of previous course and made new runs.  As the first column indicates, the relative ranking didn't change much, and that's why we didn't compile all solutions.  For the CPU column, we now indicate the relative gain brought by the compiler. 

No surprise that there is almost no difference for the CMS Pipelines solutions, as for REXX most of the procedure is one long command. 

What's more surprising is that compiling QTYPES25 (optimized REXX coding) shows less gain than compiling QTYPES1 (less optimized).  This shows that the compiler is clever enough to optimize when 'heavy' REXX functions are used. 

At last, we ran our (not compiled) procedures for all files on all disks (12949 files, 458 different filetypes) and then we see that the difference between the solutions using wordpos() and the others become really apparent. 

 R  V.cpu  T.cpu  SIO    IO  Service  Elapsed Page Page ! Comment/Command
 U   time   time        time    time     time  rds wrts !
 N (Secs) (Secs)  nbr (Secs)  (Secs)   (Secs)           !
--------------------------------------------------------+----------------------
 1  18.79  18.93    0   0.00   18.93   107.93   11    7  QTYPES1  Rexx Wordpos
 2   3.53   3.58    0   0.00    3.58    19.69    1  F 0  QTYPES2  Rexx stems
 3   3.50   3.55    0   0.00    3.55    12.47    0    0  QTYPES25 Rexx stems
 4   1.96   1.98    0   0.00    1.98     8.60    4    0  QTYPES3  Pipe
 5   1.95   1.97    0   0.00    1.97     7.14    0    0  QTYPES35 Pipe 2 sorts

Footnotes:

(1) We quote a paragraph from the book "VM and the VM Community : Past, Present and Future", written by Melinda Varian from the Princeton University :

One of the programmers at Lincoln during that period was Jim March.  (Before I go any further with this story, I must assure you that Jim is a very good programmer).  One evening in 1968, Jim was working late and found he needed to sort a list of a few hundred items, so he threw together a "quick and dirty" sort/merge program and sorted his list and forgot all about it.  Shortly after that, he left Lincoln to go to IDC and didn't have the opportunity to use a "vanilla" CMS again until 1977, when he moved to Bank of America.  The Bank sent him to a GUIDE meeting at which there was much complaining about CMS SORT, so when he got home he printed off a listing and sat down to take a look.  To his horror, he immediately recognized that he was the author of the reviled CMS SORT command.  He was so embarrassed that he wrote a good CMS SORT and distributed it to all the members of the GUIDE VM Project...
Back to text

(2) By coding NBR.=0, REXX will have a default for all tails.  Example. NBR.LISTFILE will be 0 initially.
Back to text

(3) The REXX Compiler tries to optimize this case ; when word(...,i) is used in a loop, the compiler will remember where it found the word in the string, and start scanning from that position at the next iteration. 
Back to text