Kilowatt Software's Classic Rexx Tutorial Language Level 4.00 (TRL-2)

Parsing tutorial

One of the most powerful features of Rexx is its ability to parse text values. If you are like many others who are learning Rexx you may be unfamiliar with the word parse. Perhaps you recall parsing sentences during your schooling, but you think that was quite some time ago. Webster's New World Dictionary contains the following definition.

parse vt., vi. parsed, pars'ing [Now Rare]
1. to separate (a sentence) into its parts, explaining
the grammatical form, function, and interrelation of each
part 2. to describe the form, part of speech, and
function of (a word in a sentence)

The above definition has little in common with the Rexx parsing capability. The key phrase is: "to separate into its parts". For the word parse is computer science parlance for the act of separating computer input into meaningful parts for subsequent processing actions.

Rexx is one of few languages which provides parsing as a fundamental instruction. Most languages merely provide lower level string separation capabilities, leaving the preparation of parsing capabilities as user developed endeavors. Within Rexx, these capabilities are immediately available, and as you will find, very powerful!

Preparing to parse

Let us learn about parsing by analyzing the following reduction of Descartes' famous quote:

I think I am

Here is a program that parses the words in the phrase. When a value consists of words that are separated by only one space, and there are no leading or trailing spaces, the value is easy to parse into a known number of words as follows.

  parse value 'I think I am' with word1 word2 word3 word4

  say "'"word1"'"
  say "'"word2"'"
  say "'"word3"'"
  say "'"word4"'"
  
This shows:
  'I'
  'think,'
  'I'
  'am'

Here is another program that parses the above phrase.

  phrase = 'I think I am'
  do while phrase <> ''
    parse var phrase word phrase
    say "'"word"'"
    end
  
This shows:
  'I'
  'think,'
  'I'
  'am'

This simple program achieved the desired result. The program is a Rexx parsing idiom. In each loop iteration, the parse instruction extracts the first word in the phrase, and assigns the remaining words (after the first word) to the phrase variable. The loop concludes when all of the words in the phrase have been processed.

When there are more words in the value, than there are variables in the template, the trailing words are assigned to the last variable in the template. Here is an example.

  parse value 'Sam likes peaches and cream' with subject verb object
  say 'subject:' subject
  say 'verb:' verb
  say 'object:' object
  
This shows:
  subject: Sam
  verb: likes
  object: peaches and cream

Now let's make Descartes' quote a little more challenging. Additional spaces in the original phrase, and punctuation characters, introduce various difficulties.

   I  think,  I am  .

Here is the same phrase with spaces represented as dots: · , so they can be seen!

···I··think,··I am··.··

The first parsing challenge is to extract the words within the quote. Let's try to do it with the words and word built-in functions.

  phrase = '···I··think,··I am··.··''
  do i=1 for words( phrase )
    say "'"word( phrase, i )"'"
    end
  
This shows:
  'I'
  'think,'
  'I'
  'am'
  '.'

This simple program worked well, although the second word includes a trailing comma. In addition, the period is considered a word.

The following is an initial attempt to parse the words in the phrase.

  parse value '···I··think,··I am··.··' with word1 word2 word3 word4 word5
  say "'"word1"'"
  say "'"word2"'"
  say "'"word3"'"
  say "'"word4"'"
  say "'"word5"'"
  
This shows:
  'I'
  'think,'
  'I'
  'am'
  '··.··'

Notice the spaces before and after the period.

The following program achieves a better result.

  phrase = '···I··think,··I am··.··''
  do while phrase <> ''
    parse var phrase word phrase
    say "'"word"'"
    end
  
This shows:
  'I'
  'think,'
  'I'
  'am'
  '.'

This was our second parsing program. It worked fairly well, although the second word includes a trailing comma. In addition, the period is considered a word. This time there are no spaces before and after the period.

Now let's successfully parse the phrase into words.

  phrase = '···I··think,··I am··.··''
  do while phrase <> ''
    parse var phrase word phrase
    word = strip( translate( word, , ',.;":?()' ) )
    if word <> '' then
      say "'"word"'"
    end
  
This shows:
  'I'
  'think'
  'I'
  'am'

The above program translated punctuation characters to spaces, and then stripped spaces. Any characters remaining after these operations were considered a word.

Let Rexx know what you mean

When the value that is being parse contains punctuation that partitions the values into meaningful components, you can easily assign these parts to variables. Consider the following example:

  parse value 'I think, therefore I am (I think)' with precondition ', ' consequence ' (' qualifier ')'
  say 'precondition' precondition
  say 'consequence' consequence
  say 'qualifier' qualifier
  
This shows:
  'precondition' I think
  'consequence' therefore I am
  'qualifier' I think

Suppose the value consists of a sequence of fields separated by tabs. You can easily assign these to variables as follows:

  tab = '09'x /* this is an Ascii tab character */

  parse var request ,
     Company (tab) ,
     Sales (tab) ,
     CostOfGoods (tab) ,
     NetIncome (tab) ,
     Cash (tab) ,
     AccountsReceivable (tab) ,
     AccountsReceivablePrior (tab) ,
     Inventory (tab) ,
     InventoryPrior (tab) ,
     OtherCurrentAssets (tab) ,
     PropertyEquipment (tab) ,
     AccumulatedDepreciation (tab) ,
     OtherAssets (tab) ,
     TotalAssetsPrior (tab) ,
     CurrentLiabilities (tab) ,
     LongTermDebt (tab) ,
     OtherLiabilities (tab) ,
     PreferredStock (tab) ,
     CommonStock (tab) ,
     RetainedEarnings (tab) ,
     StockholdersEquity (tab) ,
     StockValue (tab) ,
     SharesOutstanding (tab) ,
     PreferredDividends (tab)

Note: the commas at the ends of the lines shown above are line continuation requests, not parse template comma separators.

Multiple value assignment

You might have seen Rexx programs that have multiple assignment instructions on a single line, especially in books. Your programs will be easier to understand if the assignments are placed on separate lines. Consider the following example.

drop a3; a33 = 7; k = 3; fred='K'; list.5 = '?'

The parse instruction can perform multiple assignments. The above assignments can be accomplished as follows:

drop a3 /* the parse instruction can not drop values */

parse value '7 3 K ?' with a33 k fred list.5

How does parsing work ?

The parse statement divides a source string into constitutent parts and assigns these to variables, as directed by the parsing template.

The following picture introduces how parsing is performed, with multiple space dividers between the variables to assign.

While the template is processed from left to right, several current positions in the source string are maintained. The motion of these positions is guided by the division specifiers within the template. In the picture above, the positions are those that would be in effect after the template's verb term is processed. The object term will be processed next. The previous start position locates the 'l' in 'likes'. The current end position locates the space between 'likes' and 'peaches'. The next start position locates the 'p' in 'peaches'. With these positions established the value 'likes' is assigned to variable verb. When the object term is processed, it is the only term remaining. Consequently, the remainder of the source string is assigned to the object variable -- it receives the value: 'peaches and cream'.

If a relative position division specifier followed the verb term, the verb variable would receive that many characters after the previous start position and all positions would be advanced to that relative position. Study the following effect:

  parse value 'Sam likes peaches and cream' with subject verb +2 object
  say 'subject:' subject
  say 'verb:' verb
  say 'object:' object
  
This shows:
  subject: Sam
  verb: li
  object: kes peaches and cream

The following is another illustration that shows how parsing is performed, with a literal pattern divider between the variables to assign.

The literal pattern in this example is a quoted comma -- ',' . The previous start position locates the 't' in 'think'. The current end position locates the ','. The next start position locates the space between the comma and the 't' in 'therefore'. With these positions established the value 'I think' is assigned to variable precondition. When the consequence term is processed, it is the only term remaining. Consequently, the remainder of the source string is assigned to the consequence variable -- it receives the value: ' therefore I am'. This value contains a leading space.

If a relative position division specifier followed the ',' literal pattern, The next start position would be that many characters after the comma in the source string.

  parse value 'I think, therefore I am' with precondition ',' +1 consequence

This advanced one character position after the comma. As a result, the consequence variable receives the value 'therefore I am' without a leading space.

Kilowatt Software's -- Classic Rexx Tutorial -- Back to top Click here if you have any comments or questions regarding this tutorial

Last updated on: 23 Oct 2003