Chapter 14. File handling, side information.

Before we continue to learn the details on how to read or write files, we have to discuss some more general aspects, such as:


Stack manipulation (MAKEBUF/DROPBUF/DESBUF).

In a CMS virtual machine, there is only one stack defined.  So, every procedure that uses the stack must include code so that it can coexist nicely with others using the stack.

Procedures must make sure not to read lines that are not meant for them, nor should they leave lines on the stack unintentionally when terminating.  To help controlling the stack:

REXX, on the other hand, provides:

Let's start with an example of EXECIO that uses the stack:

 /* Sample reading CMS file */
 address COMMAND
 'ESTATE INPUT FILE A'                            /* does file exist ? */
 if rc<>0 then do
    say 'INPUT FILE not found, can''t continue process...'
    exit rc
 end
 'EXECIO * DISKR INPUT FILE A'          /* put records (FIFO) in stack */
 if rc<>0 then do
    say 'Something abnormal happened during EXECIO * DISKR INPUT FILE'
    exit rc
 end
 do queued()
    parse pull record
    .... process the record ....
 end
 exit

We did not include code to check if the stack can be used safely...  So, what will happen if we run the procedure while there are already lines on the stack ?  Look at the figure below:

Stack buffers without Makebuf/Dropbuf


    +-------+      +-------+      +-------+-
    !records!      !records!      !       !!
    +-------+      +-------+      +-------+!
    !already!      !already!      !       !!
    +-------+      +-------+      +-------+!
    !inqueue!      !inqueue!      !       !!
    +-------+      +-------+      +-------+!
                   !  new  !-     !       !!
                   +-------+!     +-------+!
                   !records!!     !       !v
                   +-------+!     +-------+
                   !stacked!!     !stacked!
                   +-------+!     +-------+
                   !  by   !!     !  by   !
                   +-------+!     +-------+
                   !execio !!     !execio !
                   +-------+v     +-------+
     before      EXECIO (FIFO)      PULL

Our loop that pulls the records from the stack will

  1. Start reading the records that were already in the stack and that don't belong to the file.  If we had used the LIFO option in the EXECIO command, then the file records would be stacked before the old ones, but in reverse order, which is often not desirable.
  2. And, as we don't know the size of the file or the number of records stacked by EXECIO, we only can read the stack until it's empty (queued()=0).

The MAKEBUF-DROPBUF combination is the solution to our problem.  The MAKEBUF command creates a new 'logical entrypoint' in the stack, and returns a return code that indicates its level.  New records stacked are then processed first.  Look at the figure:

Stack buffers with Makebuf/Dropbuf.


                   EXECIO(FIFO)    PULL
                    +-------+   +-------+
                    !  new  !+  !       !+
                    +-------+!  +-------+!
                    !records!!  !       !!
                    +-------+!  +-------+!
                    !stacked!!  !       !!
                    +-------+!  +-------+!
                    !  by   !!  !       !!
                    +-------+!  +-------+!
             MAKEBUF!execio !v  !       !vDROPBUF         level 1
    +-------+-------+-------+   +-------+-------+-------+
    !records!       !records!   !records!       !records!
    +-------+       +-------+   +-------+       +-------+
    !already!       !already!   !already!       !already!
    +-------+       +-------+   +-------+       +-------+
    !inqueue!       !inqueue!   !inqueue!       !inqueue!
    +-------+       +-------+   +-------+       +-------+ level 0

To be safe, our procedure has thus to be extended, as follows:

 /* Sample reading CMS file */
 address COMMAND
 oldqueue = queued()     /* remember how many records already in stack */
 'MAKEBUF'  ;  buffer=rc              /* create our level and remember */
 'ESTATE INPUT FILE A'                            /* does file exist ? */
 if rc<>0 then do
    say 'INPUT FILE not found, can''t continue process...'
    exit rc
 end
 'EXECIO * DISKR INPUT FILE A'          /* put records (FIFO) in stack */
 if rc<>0 then do
    say 'Something abnormal happened during EXECIO * DISKR INPUT FILE'
    exit rc
 end
 do queued()-oldqueue             /* process only new records in stack */
    parse pull record                  /* and leave others intact...   */
    .... process the record ....
 end
 'DROPBUF' buffer                       /* drop our level stack buffer */
 exit

This is however not perfect yet, as we will see.

Attention !MAKEBUF does not create a new stack that is separated from the other(s), it just sets some logical watermark.  Nothing prevents you to continue pulling records past this watermark.

We have therefore to save the number of records in the oldqueue variable, before starting the process, so that we know when we reach the watermark again.

Usage notes.

Make sure to DROPBUF all MAKEBUFs at the end of the procedure.  Our procedure is not well written for this.  Indeed, if EXECIO or ESTATE give a non-zero return code, the procedure ends, but the buffers are not dropped.  We'll see how to solve this problem in a moment.

Let's first look at another example where 2 procedures are combined:

 /* Procedure A */            |  /* Procedure B */
 address command              |  address command
 oldqueue=queued()            |  parse arg fn ft fm .
                              |  'MAKEBUF'
 'MAKEBUF'                    |  'IDENTIFY (STACK'
 'LISTFILE PROFILE * (STACK'  |  parse pull userid . nodeid .
 do queued()-oldqueue         |  'QUERY DISK R/W (STACK'
    parse pull fileid         |  pull .      /* get rid of header */
    'EXEC B' fileid           |  parse pull . . FirstRWMode .
 end                          |  if ft=userid then exit
 'DROPBUF'                    |  ...        /* process file */
 exit                         |  'DROPBUF'
                              |  exit

Suppose you have accessed more than one R/W minidisk, and that PROFILE userid  exists.  What will happen then ?

Procedure A finds the PROFILE userid  and passes it as parameter to procedure B.  Procedure B then does processing and brutally exits as the ft variable will be identical to the userid, thereby leaving records on the stack as result of the QUERY DISK.

Procedure A gets control again in the DO loop, and will continue reading the stack...

Question 25

Suppose you have following files on your A-disk :

   PROFILE your_id
   PROFILE EXEC
   PROFILE XEDIT

and also, that you have both disk A and disk B accessed in R/W mode.

What will previous procedure A read from the stack when procedure B brutally exits ?  Will it read:

  1. all lines still on the stack ?
  2. only the lines stacked by the LISTFILE of procedure A ?
  3. only lines stacked by procedure B (due to QUERY DISK) ?
  4. none of above ?

From the answer to the question, we now know that procedure A will drop the stack buffer of procedure B after the DO loop, and not it's own buffer.

The DROPBUF command can however take a parameter:
DROPBUF without parameter, will drop the last buffer created by MAKEBUF.
DROPBUF nn will drop buffer nn and all higher level buffers.
DROPBUF 0 will drop all buffers created by earlier MAKEBUFs.
DESBUF is equivalent to a DROPBUF 0, but will also clear all input and output terminal buffers.  Thus, the commands you 'typed ahead' while your procedure was running will be discarded too.

Before you code a DESBUF command, you should normally always have a CONWAIT command to instruct CMS to write the output buffer to the screen.  If you don't, then the DESBUF will flush any line outputs (say's for example) that are still in the CMS terminal buffer and not yet displayed on the screen.

So, our procedure should be optimized in order to issue the DROPBUF in all cases.  The best solution is to provide for a unique exit routine that issues the DROPBUF and specifies the correct buffer number.  Look at this streamlined solution:

 /* Procedure A */            |  /* Procedure B */
 address command              |  address command
 oldqueue=queued()            |  parse arg fn ft fm .
                              |  'MAKEBUF' ; buffer=rc
 'MAKEBUF' ; buffer=rc        |  'IDENTIFY (STACK'
 'LISTFILE PROFILE * (STACK'  |  parse pull userid . nodeid .
 do queued()-oldqueue         |  'QUERY DISK R/W (STACK'
    parse pull fileid         |  pull .      /* get rid of header */
    'EXEC B' fileid           |  parse pull . . FirstRWMode .
 end                          |  if ft=userid then call errexit
 EXIT:                        |
 'DROPBUF' buffer             |  ...        /* process file */
 exit                         |  EXIT:
                              |  ERREXIT:
                              |  'DROPBUF' buffer
                              |  exit

First answer this question:

Question 26

What value will be assigned to variable buffer in procedure B ?

We now combine our earlier exit routine with what we have learned here:

General exit routine.


 oldqueue=queued()
 'MAKEBUF' ; buffer=rc
 ...
 ERREXIT:
 parse source . . myname mytype . mysyn .     /* who are we ? */
 do n=2 to arg()                   /* show all error messages */
    say myname':' arg(n)           /* display the nth message */
 end n
 if symbol('buffer')='VAR' then 'DROPBUF' buffer
 exit arg(1)                  /* exit with return code passed */

We use another REXX trick here.  The symbol() function allows us to test if a variable has ever been initialized in the procedure.  If it was, symbol() returns VAR, else it returns LIT.  If our procedure did not need a MAKEBUF, then we would never have initialized the variable buffer and so, a DROPBUF would not be needed either.  This makes our exit routine general for any procedure (even those that don't use the stack).  The only drawback is that the variable name buffer is hard-coded and becomes kind of a reserved name in your procedures.

You might as well include these statements once and for ever in your procedures via the PROFILE XEDIT...


The situation on OS/2 is completely different.  A REXX procedure can create a new stack and give it a name.  The procedure can then reference the buffer by name, which makes things easier to handle.  Another big difference is that, when your procedure ends without clearing the stack buffer, nothing will happen.  Hence, the OS/2 system will not try to "execute" the stacked records.  The data remains in the OS/2 storage, and is even global to the system.  Consequently, another procedure can start later and read the stack buffer if it knows the name.


Opening and closing files.

Many programmers leave the burden of opening and closing files, to CMS.  We call this implicit opening/closing.

We will however learn that this is not a good and safe practice.

Explicitly opening files is rarely needed, and sometimes even not possible with some methods, such as EXECIO, but explicit close of a file is good programming practice.

Opening your files.

Closing your files.

CMS will issue an implicit file close against all open files at end-of-command, but we will learn that this is not safe.

Files will be closed anyway at end-of-command, that is when the Ready; prompt is issued.

In general, you have to use the same kind of function to close a file as the one you used to open the file.  For example, a CMS FINIS command will not close a file opened via the REXX stream I/O functions.

Knowing that, we can explain now why closing the files is so important.  We will use EXECIO in our examples, but the principles are the same for CSL and REXX stream I/O.

Why an explicit close is needed ?

The FINIS option of EXECIO instructs the command to close the file after the operation is performed.  A file can also be explicitly closed in a procedure via the FINIS command of CMS.

The first consequence of closing the file is that the current record pointer is dropped.  This means that if a subsequent read (implicitly) re-opens the file, reading starts again at the first record.

Therefore, if you want to find all occurrences of a string in a file, you need to use the LOCATE or FIND options of EXECIO, but you should not close the file until you have found all your records !  If you would include a FINIS option with your EXECIO command, then you would end up with an endless loop as you would always get the first record containing the requested string.

We want however to prove that good programmers always close their files - and only their files - before leaving the procedure.  They should not count on CMS to do it !  Why ?

Well, to understand what might happen, let's analyze an example that may lead to problems:

 /* Procedure A */                    | /* Proc XYZ - closes wrong files */
 address COMMAND                      | address COMMAND
 do forever                           | 'EXECIO * DISKR OTHER FILE A (FINIS'
    'EXECIO 1 DISKR SOME FILE A'      | 'EXECIO * DISKW OUTPUT FILE A',
    if rc<>0 then leave               |      '(STRING My output, don''t touch!'
    parse upper pull word1 .          | 'FINIS * * A'
    if word1='RUN_XYZ' then 'EXEC XYZ'| exit
 end                                  |
 exit                                 |

What will be the result of this ?  Think a while before reading on...

As procedure XYZ closes all open files before returning to the calling procedure A, this one will read SOME FILE, again from the top, and this leads to a loop !  The error is that procedure XYZ closes more than just the files it opened.

There is another problem that you can encounter, but we have first to explain how CMS works internally.

When you close a file via FINIS (either the CMS command or the option of EXECIO), CMS will indeed close the file and write the data onto the disk, but the file will only be committed when all files for that minidisk are closed.  A commit means that the File Status Table gets rewritten to disk.  Indeed, CMS has some difficulties to determine the start- and end-point of a 'task'(footnote 2).

For CMS, the only clear endpoint of a task is when it reaches the Ready; state.  CMS will then implicitly close all files and commit them to disk.

As it is very common to start procedures from other environments, such as XEDIT or FILELIST, you will not reach the Ready; state when you return from your procedure.  This can be only much later, or maybe never if there is a power-failure, or if your machine is forced off the system.

And there are even more dangers.  If you use GLOBALV to save variables in your procedure and you have also opened a work file on your A-disk, then the LASTING GLOBALV file will not be committed unless you close the work file !

Conclusion:


A good procedure always

  • Closes all files it opened before termination ;
  • Closes ONLY its files, so certainly never uses FINIS * * A.

CMS Low level I/O interface

In Appendix F. "Host versus Personal Systems" you will learn that there are big differences in performance for the various methods that read files.  CSL and Stream I/O are not always very fast ; in the past EXECIO was much slower than for example XEDIT.

Some of the effects can be explained by what we told before:

Switching environments is costly, so ask as much as possible in one call.  CSL forces you to read records one by one, surely not good for performance.
 

Why was XEDIT so much faster than EXECIO ?

Well, even inside CMS commands, our rule applies.  EXECIO asked one record at a time, whereas XEDIT used a lower level interface, the so called CMS Block interface(footnote 3).

This interface lets you ask CMS to get one or more blocks from disk.  This means the requestor has to have an in-depth knowledge of the CMS file system.  The CSL routines also provide the DMSOPBLK, DMSRDBLK and DMSWRBLK routines to use the CMS block interface.  Fortunately, both EXECIO and CMS Pipelines use this same interface today.


Special file formats.

Empty files

We have mentioned the existence of empty files when we discussed the solutions of Lesson 1.  These can be created in the SFS system.  These files contain of course no data, but have all other file attributes like name, date, authorities, aliases, etc..  They are created via a CREATE FILE command or remain after an ERASE fileid (DATAONLY.

SPARSE files

Just to be complete, we have to tell a few words about sparse files as you might encounter this term in the CMS literature.

The easiest definition is to say that a sparse file is a Fixed length record file with holes in it.

For example, you could set up a client database where the client number corresponds to the record number, so that it is possible to retrieve the client information via a direct access to the corresponding record.

If you create the record for client number 1000, the file will indeed have at least 1000 records.  But what about the other records for which no client information yet exists ?  Do they consume disk space ?  No, not in CMS, thanks to the concept of sparse files !

CMS will indeed not write blocks to disk when they contain only binary zeroes, the records in those blocks will only exist logically.  Even the index blocks that contain only binary zeroes will not be written to disk.

Note: Don't confuse binary zeroes with blanks (spaces).

You want to see an example ?  Issue the following commands:

   execio 1 diskw test1 file a 1 F 80 (finis string This is a record
   listfile test1 file (label               /* note it takes 1 block */
   xedit test1 file
      dup 1000
      file
   listfile test1 file (label             /* note it takes 20 blocks */
   execio 1 diskw test2 file a 1000 f 80 (finis string This is a record
   listfile test2 file (label         /* takes 1 block for 1000 recs */

Note: It is possible to read a record that doesn't physically exist, CMS will not have to do an I/O, but simply returns a record of binary zeroes.

Next chapter will cover the stream I/O functions in more detail.


Footnotes:

(1) SENTRIES is a CMS command that sets the return code to the number of lines in the stack.  It is part of the CMS Utility Feature.  It was needed for EXEC2.  Now we are fortunate to have the queued() function of REXX.
Back to text

(2) VSE/ESA and MVS/ESA systems have an easier task here, they close the files and do a clean up at end-of-job.
Back to text

(3) For specialists: this has nothing to do with CP's *BLOCKIO IUCV service.  *BLOCKIO is a way to request CP to read a block (or more) from disk, it is used by the SQL/DS and SFS service machines.
Back to text