Kilowatt Software's
Classic Rexx Tutorial
Language Level 4.00 (TRL-2)

Parsing tutorial

One of the most powerful features of Rexx is its ability to parse text values. If you are like many others who are learning Rexx you may be unfamiliar with the word parse. Perhaps you recall parsing sentences during your schooling, but you think that was quite some time ago. Webster's New World Dictionary contains the following definition.

The above definition has little in common with the Rexx parsing capability. The key phrase is: "to separate into its parts". For the word parse is computer science parlance for the act of separating computer input into meaningful parts for subsequent processing actions.

Rexx is one of few languages which provides parsing as a fundamental instruction. Most languages merely provide lower level string separation capabilities, leaving the preparation of parsing capabilities as user developed endeavors. Within Rexx, these capabilities are immediately available, and as you will find, very powerful!

Preparing to parse

Let us learn about parsing by analyzing the following reduction of Descartes' famous quote:

Here is a program that parses the words in the phrase. When a value consists of words that are separated by only one space, and there are no leading or trailing spaces, the value is easy to parse into a known number of words as follows.

Here is another program that parses the above phrase.

This simple program achieved the desired result. The program is a Rexx parsing idiom. In each loop iteration, the parse instruction extracts the first word in the phrase, and assigns the remaining words (after the first word) to the phrase variable. The loop concludes when all of the words in the phrase have been processed.

When there are more words in the value, than there are variables in the template, the trailing words are assigned to the last variable in the template. Here is an example.

Now let's make Descartes' quote a little more challenging. Additional spaces in the original phrase, and punctuation characters, introduce various difficulties.

Here is the same phrase with spaces represented as dots: · , so they can be seen!

The first parsing challenge is to extract the words within the quote. Let's try to do it with the words and word built-in functions.

This simple program worked well, although the second word includes a trailing comma. In addition, the period is considered a word.

The following is an initial attempt to parse the words in the phrase.

Notice the spaces before and after the period.

The following program achieves a better result.

This was our second parsing program. It worked fairly well, although the second word includes a trailing comma. In addition, the period is considered a word. This time there are no spaces before and after the period.

Now let's successfully parse the phrase into words.

The above program translated punctuation characters to spaces, and then stripped spaces. Any characters remaining after these operations were considered a word.

Let Rexx know what you mean

When the value that is being parse contains punctuation that partitions the values into meaningful components, you can easily assign these parts to variables. Consider the following example:

Suppose the value consists of a sequence of fields separated by tabs. You can easily assign these to variables as follows:

Note: the commas at the ends of the lines shown above are line continuation requests, not parse template comma separators.

Multiple value assignment

You might have seen Rexx programs that have multiple assignment instructions on a single line, especially in books. Your programs will be easier to understand if the assignments are placed on separate lines. Consider the following example.

The parse instruction can perform multiple assignments. The above assignments can be accomplished as follows:

How does parsing work ?

The parse statement divides a source string into constitutent parts and assigns these to variables, as directed by the parsing template.

The following picture introduces how parsing is performed, with multiple space dividers between the variables to assign.

While the template is processed from left to right, several current positions in the source string are maintained. The motion of these positions is guided by the division specifiers within the template. In the picture above, the positions are those that would be in effect after the template's verb term is processed. The object term will be processed next. The previous start position locates the 'l' in 'likes'. The current end position locates the space between 'likes' and 'peaches'. The next start position locates the 'p' in 'peaches'. With these positions established the value 'likes' is assigned to variable verb. When the object term is processed, it is the only term remaining. Consequently, the remainder of the source string is assigned to the object variable -- it receives the value: 'peaches and cream'.

If a relative position division specifier followed the verb term, the verb variable would receive that many characters after the previous start position and all positions would be advanced to that relative position. Study the following effect:

The following is another illustration that shows how parsing is performed, with a literal pattern divider between the variables to assign.

The literal pattern in this example is a quoted comma -- ',' . The previous start position locates the 't' in 'think'. The current end position locates the ','. The next start position locates the space between the comma and the 't' in 'therefore'. With these positions established the value 'I think' is assigned to variable precondition. When the consequence term is processed, it is the only term remaining. Consequently, the remainder of the source string is assigned to the consequence variable -- it receives the value: ' therefore I am'. This value contains a leading space.

If a relative position division specifier followed the ',' literal pattern, The next start position would be that many characters after the comma in the source string.

This advanced one character position after the comma. As a result, the consequence variable receives the value 'therefore I am' without a leading space.


Kilowatt Software's -- Classic Rexx Tutorial -- Back to top
Click here if you have any comments or questions regarding this tutorial

Last updated on: 23 Oct 2003