Appendix F. Host versus Personal Systems

To prove you that portability and performance between different platforms are two totally different concepts, we designed a simple benchmark to run on both VM/ESA and OS/2.  The procedure reads a text file (the README file of OS/2 2.11, 1933 records, 72274 bytes), and puts the output in a REXX array.  We compared the LINEIN() with the CHARIN() solution.

Reading a text file via LINEIN() - case 1

   /* Read a file */
   parse source environment .
   if environment='CMS' then fileid='README FILE A'
                        else fileid='C:\README'
   start=timer()                                  /* save start timer */
   do i=1 by 1 while lines(fileid)>0        /* read until end-of-file */
      rec.i=linein(fileid)                         /* records in REC. */
   end
   rec.0=i-1                                     /* number of records */
   call stream fileid,'C','CLOSE'                   /* close the file */
   final=timer()                                  /* save final timer */
   say 'LINEIN takes : 'format(final-start,5,2) 'seconds'
   say 'Number of records= 'right(rec.0,8)
   exit
   /******************** subroutine to get time ***********************/
   TIMER:
    if environment<>'CMS' then return time('E')
                          else do
       parse value diag(08,'Q TIME',160) with . 'TOTCPU= 'ttm':'+1 tts0 +5
       timer =tts0 + ttm*60
       return timer
    end

Notes:

  1. Remark the parse source instruction to query on which platform we are working.
  2. The time('E') returns the elapsed time.  This is the only information we can use on the personal computer.  On CMS, we extract the Total CPU time from the CP QUERY TIME command.  Using the elapsed time would not be accurate as it depends on the load of the system.  On a PC, if no other applications run in the background, then we may suppose the elapsed time is an acceptable measurement value.
  3. In all solutions, we include the close function (as we learned this should be done).  And, by the way, on OS/2 you have even more reason to close the files as soon as possible, as on that platform, the number of concurrently open files in one application is limited...

As CMS knows about the number of records in a file, it is possible to avoid executing the lines(fileid) at each iteration.  Therefore, a slightly enhanced procedure, but incompatible with OS/2, looks as follows:

Reading a text file via LINEIN() - case 2

   /* Read a file */
   parse source environment .
   if environment='CMS' then fileid='README FILE A'
                        else fileid='C:\README'
   start=timer()                                  /* save start timer */
   rec.0=lines(fileid)
   do i=1 to rec.0                          /* read until end-of-file */
      rec.i=linein(fileid)                         /* records in REC. */
   end
   call stream fileid,'C','CLOSE'                   /* close the file */
   final=timer()                                  /* save final timer */
   say 'LINEIN takes : 'format(final-start,5,2) 'seconds'
   say 'Number of records= 'right(rec.0,8)
   exit

Note: The TIMER subroutine is not repeated here and in next programs, it is always the same.

Let's now analyze our program using CHARIN():

Reading a text file via CHARIN() - case 3

   /* Read a file */
   parse source environment .
   if environment='CMS' then do
      address command
      fileid='README FILE A'
      linend='15'x
      'MAKEBUF' ; buffer=rc
      'QUERY DISK A (STACK'
      parse pull header
      parse pull . . . . . . blksize .
      'LISTFILE' fileid '(STACK ALLOC'
      parse pull . . . . lrecl nrecs nblocks .
      numchars=nblocks*blksize + nrecs
   end;                 else do
      fileid='C:\README'
      linend='0D0A'x
      numchars=chars(fileid)
   end
   start=timer()                                  /* save start timer */
   file=charin(fileid,1,numchars)          /* read the file in 1 shot */
   call stream fileid,'C','CLOSE'                     /* and close it */
   inter=timer()                           /* save intermediate timer */
   if right(file,1)='1A'x then
      file=left(file,length(file)-1)          /* drop EOF char if any */
   do i=1 by 1 while file<>''                   /* parse the variable */
      parse var file rec.i (linend) file              /* into records */
   end
   rec.0=i-1
   final=timer()                                  /* save final timer */
   say 'CHARIN takes  : 'format(inter-start,5,2) 'seconds'
   say 'Parsing takes : 'format(final-inter,5,2) 'seconds'
   say 'Total time    : 'format(final-start,5,2) 'seconds'
   say 'Number of records= 'right(rec.0,8)
   if environment='CMS' then 'DROPBUF' buffer
   exit

Notes:

  1. The charin() statement allows us to read the complete file into one REXX variable.  To be honest in our comparisons we need to split this unique variable into records and create an array so as to come to an equivalent situation as with LINEIN().  There are however cases where you don't need to do this step and where you can process the file immediately.

  2. The first technique for splitting the file into records is to use the eat the string by parsing technique, as we have learned in lesson 3.  On OS/2 the logical records are separated from each other by CRLF characters ('0D0A'x).  In CMS, the CHARIN() does an implicit open, then by default appends a LINEEND character to each record.  This character is '15'x in CMS, which explains our logic in the beginning of the procedure.
  3. On the OS/2 platform, the system knows how many characters are in the file.  CMS, however knows how many records are in the file.  So, we have to calculate the (approximate) number of characters in the file when working in CMS in order to specify this number as third parameter in the CHARIN() function.  The initialization routine therefore issues a QUERY DISK to know the physical block size of the minidisk, and a LISTFILE to query for the number of blocks the file needs on the disk, and the number of records in the file.  We have to add one character for each record, as the records will be separated by our LINEND character.  The result in our case is 73728 bytes (as opposed to the 72274 bytes on OS/2, but this is due to rounding to 4K blocks here and the extra LINEND characters). 
    In a first approach, we made the error to use only LISTFILE and to multiply the number of records with the logical record length.  This gave us a much higher buffer size (more than 140000) as LISTFILE returns the length of the largest record.
  4. Before we start to parse the big variable into records, we have another step in our procedure.  Do you remember that files on personal systems may have an end-of-file character appended to the last record ('1A'x).  In order to let our loop work (while file<>''), we have therefore to strip the character.  In an earlier solution we didn't perform this step but coded our parse for OS/2 as this:
       parse var file rec.i (linend) file '1A'x
    

    and the benchmark results are simply horrible (10 times worse), just because REXX had to search for the '1A'x character at each iteration and only to find it as the last iteration.  When we say horrible, we don't exaggerate.  It took 134 seconds to parse the file into records as opposed to our 19 seconds in the improved version.

    So, a little bit of investigation can give a tremendous return on performance...

In our quest to find an even faster solution, we changed the parsing into a bit more complex coding whereby we no longer eat the string:
Reading a text file via CHARIN() - case 4

    /* Read a file */
    parse source environment .
    if environment='CMS' then do
       address command
       fileid='README FILE A'
       linend='15'x
       step=1
       'MAKEBUF' ; buffer=rc
       'QUERY DISK A (STACK'
       parse pull header
       parse pull . . . . . . blksize .
       'LISTFILE' fileid '(STACK ALLOC'
       parse pull . . . . lrecl nrecs nblocks .
       numchars=nblocks*blksize + nrecs
    end;                 else do
       fileid='C:\README'
       linend='0D0A'x
       step=2
       numchars=chars(fileid)
    end
    start=timer()                             /* save start timer */
    file=charin(fileid,1,numchars)     /* read the file in 1 shot */
    call stream fileid,'C','CLOSE'                /* and close it */
    inter=timer()                      /* save intermediate timer */
    opos=1                            /* start at position 1 */
    do i=1 by 1
       a=pos(linend,file,opos)             /* search for line-end */
       if a=0 then leave                   /* if no more, leave   */
       rec.i=substr(file,opos,a-opos)      /* take substring      */
       opos=a+step                         /* skip the LINEENDs   */
    end
    rec.0=i-1
    final=timer()                             /* save final timer */
    say 'CHARIN takes  : 'format(inter-start,5,2) 'seconds'
    say 'Parsing takes : 'format(final-inter,5,2) 'seconds'
    say 'Total time    : 'format(final-start,5,2) 'seconds'
    say 'Number of records= 'right(rec.0,8)
    if environment='CMS' then 'DROPBUF' buffer
    exit

Notes:

  1. At first glance, this looks a silly solution, as we have a more complex REXX coding and call for POS() and SUBSTR() multiple times.  But, if you look at the results, you'll see that on OS/2 this is significantly better than parsing, but on CMS it is worse than parsing.  We already mentioned in lesson 3 that this is probably due to the totally different storage management in both systems and the way parse works.
  2. We have to make another distinction between the systems here, namely the step to skip the line-end characters is different (2 for CRLF on OS/2, and 1 for '15'x on CMS).
  3. Finally, we don't have to verify if an EOF character is present on OS/2.  Our logic will simply ignores the last record containing only that EOF character.

As we also have CMS Pipelines, EXECIO and Callable Services on CMS, we ran the same test with these commands.  The coding is very simple in the first 2 cases:

Reading a text file via CMS Pipelines - case 5

   /* Read a file */
   address command
   parse source environment .
   if environment='CMS' then fileid='README FILE A'
                        else exit             /* not for OS/2 */
   start=timer()                    --        /* save start timer */
   'PIPE < 'fileid'!STEM REC.'
   final=timer()                              /* save final timer */
   say 'CMS Pipelines : 'format(final-start,5,2) 'seconds'
   say 'Number of records= 'right(rec.0,8)
   exit

Reading a text file via EXECIO - case 6

   /* Read a file */
   address command
   parse source environment .
   if environment='CMS' then fileid='README FILE A'
                        else exit             /* not for OS/2 */
   start=timer()                              /* save start timer */
   'EXECIO * DISKR' fileid '(FINIS STEM REC.'
   final=timer()                              /* save final timer */
   say 'EXECIO takes : 'format(final-start,5,2) 'seconds'
   say 'Number of records= 'right(rec.0,8)
   exit

For the CSL solution, the coding is very similar to the one shown in lesson 3.

Reading a text file via CSL - case 7

   /* Read a file */
   address command
   parse source environment .
   if environment='CMS' then fileid='README FILE A'
                        else exit              /* not for OS/2 */
   start=timer()                           /* save start timer */
   parse value 'READ NEWDATEREF;COMMIT  1   100 0    0',
         with   OpenType      ';'commit one twh retc reason token
   parse value length(fileid) length(OpenType) length(commit),
         with  l_fid          l_OpenType       l_commit
   call csl 'DMSOPEN retc reason fileid l_fid OpenType l_OpenType token'
   if retc>4 then do
      if reason=44000 then call errexit 28,'File not found'
      call errexit retc,'----> DMSOPEN error: retc='retc '; reason='reason
   end
   do i=1 until retc^=0
      call csl 'DMSREAD retc reason token one twh record twh lrecl'
      if retc>4 then call errexit retc,,
                     '----> DMSREAD error: retc='retc '; reason='reason
      if reason=90103 then leave i /* end-of-file */
      rec.i=left(record,lrecl)
   end i
   rec.0=i-1
   call csl 'DMSCLOSE retc reason token commit l_commit'
   final=timer()                              /* save final timer */
   say 'CSL call      : 'format(final-start,5,2) 'seconds'
   exit

The test on VM/ESA was run twice.  For the first run, the files were stored in SFS directories.  We don't know the time used by the SFS server.  The second run, the files were on a regular minidisk.

Time now to look at the results :

Case Method VM/ESA SFS VM/ESA Mdisk OS/2
1 LINEIN() Multiple call to LINES() 6.85
100%
2.70
100%
5.90
100%
2 LINEIN() Single call to LINES() 2.85
41%
1.25
46%
n/a
3 CHARIN() with parsing 1.80 (0.20/1.60)
26%
1.80 (0.15/1.65)
67%
19.00 (0.25/18.75)
322%
4 CHARIN() with REXX logic 4.75 (0.20/4.55)
69%
4.75 (0.15/4.60)
175%
3.20 (0.25/2.95)
54%
5 CMS Pipelines 0.09
1%
0.09
3%
n/a
6 EXECIO 0.09
1%
0.09
3%
n/a
7 CSL Calls 1.50
21%
1.50
55%
n/a

Notes :

  1. The absolute number of seconds should of course not be compared between the 2 systems.  An Intel (486-25Mhz at the time of our test) is something totally different than a VM/ESA running in a Logical Partition of a 3090 !  You should compare the different solutions on each platform separately.
  2. The figures are also averages of a few runs.
  3. Don't jump to conclusions when analyzing the SFS figures. Only the LINEIN() function gives much higher figures compared to regular minidisks (and we even don't know the consumption of the SFS server !).  We learn that when you use LINEIN() there is an APPC transfer for each logical record, while the other methods transfer the data in large buffers.
  4. The figures for the CHARIN() solution are split up in the file reading part and the record splitting part.  Notice that the file reading part is very small.  This means that if you don't need to create the stem and can eliminate the REXX logic to build the stem, then CHARIN() is a real winner to LINEIN() (on both platforms).
  5. We can't read CMS files that contain '15'x characters as these would be considered to be LINEEND characters.
  6. Coding a DO WHILE function() (compare cases 1 and 2) is costly as the function gets executed at every iteration.  If the result of the function doesn't change between iterations, then you should definitely avoid using this construct.

And the winners are... yes, CMS Pipelines and EXECIO with a total CPU time less than 0.10 seconds !  And CMS Pipelines manages to do it in 3 I/O's while all other cases need 4 I/O's to read the file.

One last, personal, opinion.  Portability is (still) a dream. Each platform has its own typical strong and weak points.  You have to take this into account when performance is important.  For the moment, we think portable programs should use bimodal code, like we did a bit in our benchmark.  Test the environment on which you run, and use the appropriate method for the function...