To prove you that portability and performance between different platforms are two totally different concepts, we designed a simple benchmark to run on both VM/ESA and OS/2. The procedure reads a text file (the README file of OS/2 2.11, 1933 records, 72274 bytes), and puts the output in a REXX array. We compared the LINEIN() with the CHARIN() solution.
Reading a text file via LINEIN() - case 1 |
---|
/* Read a file */ parse source environment . if environment='CMS' then fileid='README FILE A' else fileid='C:\README' start=timer() /* save start timer */ do i=1 by 1 while lines(fileid)>0 /* read until end-of-file */ rec.i=linein(fileid) /* records in REC. */ end rec.0=i-1 /* number of records */ call stream fileid,'C','CLOSE' /* close the file */ final=timer() /* save final timer */ say 'LINEIN takes : 'format(final-start,5,2) 'seconds' say 'Number of records= 'right(rec.0,8) exit /******************** subroutine to get time ***********************/ TIMER: if environment<>'CMS' then return time('E') else do parse value diag(08,'Q TIME',160) with . 'TOTCPU= 'ttm':'+1 tts0 +5 timer =tts0 + ttm*60 return timer end |
Notes:
As CMS knows about the number of records in a file, it is possible to avoid executing the lines(fileid) at each iteration. Therefore, a slightly enhanced procedure, but incompatible with OS/2, looks as follows:
Reading a text file via LINEIN() - case 2 |
---|
/* Read a file */ parse source environment . if environment='CMS' then fileid='README FILE A' else fileid='C:\README' start=timer() /* save start timer */ rec.0=lines(fileid) do i=1 to rec.0 /* read until end-of-file */ rec.i=linein(fileid) /* records in REC. */ end call stream fileid,'C','CLOSE' /* close the file */ final=timer() /* save final timer */ say 'LINEIN takes : 'format(final-start,5,2) 'seconds' say 'Number of records= 'right(rec.0,8) exit |
Note: The TIMER subroutine is not repeated here and in next programs, it is always the same.
Let's now analyze our program using CHARIN():
Reading a text file via CHARIN() - case 3 |
---|
/* Read a file */ parse source environment . if environment='CMS' then do address command fileid='README FILE A' linend='15'x 'MAKEBUF' ; buffer=rc 'QUERY DISK A (STACK' parse pull header parse pull . . . . . . blksize . 'LISTFILE' fileid '(STACK ALLOC' parse pull . . . . lrecl nrecs nblocks . numchars=nblocks*blksize + nrecs end; else do fileid='C:\README' linend='0D0A'x numchars=chars(fileid) end start=timer() /* save start timer */ file=charin(fileid,1,numchars) /* read the file in 1 shot */ call stream fileid,'C','CLOSE' /* and close it */ inter=timer() /* save intermediate timer */ if right(file,1)='1A'x then file=left(file,length(file)-1) /* drop EOF char if any */ do i=1 by 1 while file<>'' /* parse the variable */ parse var file rec.i (linend) file /* into records */ end rec.0=i-1 final=timer() /* save final timer */ say 'CHARIN takes : 'format(inter-start,5,2) 'seconds' say 'Parsing takes : 'format(final-inter,5,2) 'seconds' say 'Total time : 'format(final-start,5,2) 'seconds' say 'Number of records= 'right(rec.0,8) if environment='CMS' then 'DROPBUF' buffer exit |
Notes:
The charin() statement allows us to read the complete file into one REXX variable. To be honest in our comparisons we need to split this unique variable into records and create an array so as to come to an equivalent situation as with LINEIN(). There are however cases where you don't need to do this step and where you can process the file immediately.
parse var file rec.i (linend) file '1A'x
and the benchmark results are simply horrible (10 times worse), just because REXX had to search for the '1A'x character at each iteration and only to find it as the last iteration. When we say horrible, we don't exaggerate. It took 134 seconds to parse the file into records as opposed to our 19 seconds in the improved version.
So, a little bit of investigation can give a tremendous return on performance...
In our quest to find an even faster solution, we changed the
parsing into a bit more complex coding
whereby we no longer eat the string:
Reading a text file via CHARIN() - case 4 |
---|
/* Read a file */ parse source environment . if environment='CMS' then do address command fileid='README FILE A' linend='15'x step=1 'MAKEBUF' ; buffer=rc 'QUERY DISK A (STACK' parse pull header parse pull . . . . . . blksize . 'LISTFILE' fileid '(STACK ALLOC' parse pull . . . . lrecl nrecs nblocks . numchars=nblocks*blksize + nrecs end; else do fileid='C:\README' linend='0D0A'x step=2 numchars=chars(fileid) end start=timer() /* save start timer */ file=charin(fileid,1,numchars) /* read the file in 1 shot */ call stream fileid,'C','CLOSE' /* and close it */ inter=timer() /* save intermediate timer */ opos=1 /* start at position 1 */ do i=1 by 1 a=pos(linend,file,opos) /* search for line-end */ if a=0 then leave /* if no more, leave */ rec.i=substr(file,opos,a-opos) /* take substring */ opos=a+step /* skip the LINEENDs */ end rec.0=i-1 final=timer() /* save final timer */ say 'CHARIN takes : 'format(inter-start,5,2) 'seconds' say 'Parsing takes : 'format(final-inter,5,2) 'seconds' say 'Total time : 'format(final-start,5,2) 'seconds' say 'Number of records= 'right(rec.0,8) if environment='CMS' then 'DROPBUF' buffer exit |
Notes:
As we also have CMS Pipelines, EXECIO and Callable Services on CMS, we ran the same test with these commands. The coding is very simple in the first 2 cases:
Reading a text file via CMS Pipelines - case 5 |
---|
/* Read a file */ address command parse source environment . if environment='CMS' then fileid='README FILE A' else exit /* not for OS/2 */ start=timer() -- /* save start timer */ 'PIPE < 'fileid'!STEM REC.' final=timer() /* save final timer */ say 'CMS Pipelines : 'format(final-start,5,2) 'seconds' say 'Number of records= 'right(rec.0,8) exit |
Reading a text file via EXECIO - case 6 |
---|
/* Read a file */ address command parse source environment . if environment='CMS' then fileid='README FILE A' else exit /* not for OS/2 */ start=timer() /* save start timer */ 'EXECIO * DISKR' fileid '(FINIS STEM REC.' final=timer() /* save final timer */ say 'EXECIO takes : 'format(final-start,5,2) 'seconds' say 'Number of records= 'right(rec.0,8) exit |
For the CSL solution, the coding is very similar to the one shown in lesson 3.
Reading a text file via CSL - case 7 |
---|
/* Read a file */ address command parse source environment . if environment='CMS' then fileid='README FILE A' else exit /* not for OS/2 */ start=timer() /* save start timer */ parse value 'READ NEWDATEREF;COMMIT 1 100 0 0', with OpenType ';'commit one twh retc reason token parse value length(fileid) length(OpenType) length(commit), with l_fid l_OpenType l_commit call csl 'DMSOPEN retc reason fileid l_fid OpenType l_OpenType token' if retc>4 then do if reason=44000 then call errexit 28,'File not found' call errexit retc,'----> DMSOPEN error: retc='retc '; reason='reason end do i=1 until retc^=0 call csl 'DMSREAD retc reason token one twh record twh lrecl' if retc>4 then call errexit retc,, '----> DMSREAD error: retc='retc '; reason='reason if reason=90103 then leave i /* end-of-file */ rec.i=left(record,lrecl) end i rec.0=i-1 call csl 'DMSCLOSE retc reason token commit l_commit' final=timer() /* save final timer */ say 'CSL call : 'format(final-start,5,2) 'seconds' exit |
The test on VM/ESA was run twice. For the first run, the files were stored in SFS directories. We don't know the time used by the SFS server. The second run, the files were on a regular minidisk.
Time now to look at the results :
Case | Method | VM/ESA SFS | VM/ESA Mdisk | OS/2 |
---|---|---|---|---|
1 | LINEIN() Multiple call to LINES() | 6.85 100% | 2.70 100% | 5.90 100% |
2 | LINEIN() Single call to LINES() | 2.85 41% | 1.25 46% | n/a |
3 | CHARIN() with parsing | 1.80 (0.20/1.60) 26% | 1.80 (0.15/1.65) 67% | 19.00 (0.25/18.75) 322% |
4 | CHARIN() with REXX logic | 4.75 (0.20/4.55) 69% | 4.75 (0.15/4.60) 175% | 3.20 (0.25/2.95) 54% |
5 | CMS Pipelines | 0.09 1% | 0.09 3% | n/a |
6 | EXECIO | 0.09 1% | 0.09 3% | n/a |
7 | CSL Calls | 1.50 21% | 1.50 55% | n/a |
Notes :
And the winners are... yes, CMS Pipelines and EXECIO with a total CPU time less than 0.10 seconds ! And CMS Pipelines manages to do it in 3 I/O's while all other cases need 4 I/O's to read the file.
One last, personal, opinion. Portability is (still) a dream. Each platform has its own typical strong and weak points. You have to take this into account when performance is important. For the moment, we think portable programs should use bimodal code, like we did a bit in our benchmark. Test the environment on which you run, and use the appropriate method for the function...