Product SiteDocumentation Site

Chapter 9. Parsing

9.1. Simple Templates for Parsing into Words
9.1.1. Message Term Assignments
9.1.2. The Period as a Placeholder
9.2. Templates Containing String Patterns
9.3. Templates Containing Positional (Numeric) Patterns
9.3.1. Combining Patterns and Parsing into Words
9.4. Parsing with Variable Patterns
9.5. Using UPPER, LOWER, and CASELESS
9.6. Parsing Instructions Summary
9.7. Parsing Instructions Examples
9.8. Advanced Topics in Parsing
9.8.1. Parsing Several Strings
9.8.2. Combining String and Positional Patterns
9.8.3. Conceptual Overview of Parsing
The parsing instructions are ARG, PARSE, and PULL (see Section 2.2, “ARG”, Section 2.18, “PARSE”, and Section 2.20, “PULL”).
The data to be parsed is a source string. Parsing splits the data in a source string and assigns pieces of it to the variables named in a template. A template is a model specifying how to split the source string. The simplest kind of template consists of a list of variable names. Here is an example:
variable1 variable2 variable3
This kind of template parses the source string into whitespace-delimited words. More complicated templates contain patterns in addition to variable names:
String patterns
Match the characters in the source string to specify where it is to be split. (See Section 9.2, “Templates Containing String Patterns” for details.)
Positional patterns
Indicate the character positions at which the source string is to be split. (See Section 9.3, “Templates Containing Positional (Numeric) Patterns” for details.)
Parsing is essentially a two-step process:
  1. Parse the source string into appropriate substrings using patterns.
  2. Parse each substring into words.

9.1. Simple Templates for Parsing into Words

Here is a parsing instruction:

Example 9.1. Parsing templates

parse value "time and tide" with var1 var2 var3

The template in this instruction is: var1 var2 var3. The data to be parsed is between the keywords PARSE VALUE and the keyword WITH, the source string time and tide. Parsing divides the source string into whitespace-delimited words and assigns them to the variables named in the template as follows:
var1="time"
var2="and"
var3="tide"
In this example, the source string to be parsed is a literal string, time and tide. In the next example, the source string is a variable.

Example 9.2. Parse value

/* PARSE VALUE using a variable as the source string to parse    */
string="time and tide"
parse value string with var1 var2 var3           /* same results */

PARSE VALUE does not convert lowercase a-z in the source string to uppercase A-Z. If you want to convert characters to uppercase, use PARSE UPPER VALUE. See Section 9.5, “Using UPPER, LOWER, and CASELESS” for a summary of the effect of parsing instructions on the case.
Note that if you specify the CASELESS option on a PARSE instruction, the string comparisons during the scanning operation are made independently of the alphabetic case. That is, a letter in uppercase is equal to the same letter in lowercase.
All of the parsing instructions assign the parts of a source string to the variables named in a template. There are various parsing instructions because of the differences in the nature or origin of source strings. For a summary of all the parsing instructions, see Section 9.6, “Parsing Instructions Summary”.
The PARSE VAR instruction is similar to PARSE VALUE except that the source string to be parsed is always a variable. In PARSE VAR, the name of the variable containing the source string follows the keywords PARSE VAR. In the next example, the variable stars contains the source string. The template is star1 star2 star3.

Example 9.3. Parse var

/* PARSE VAR example                                             */
stars="Sirius Polaris Rigil"
parse var stars star1 star2 star3             /* star1="Sirius"  */
/* star2="Polaris" */
/* star3="Rigil"   */

All variables in a template receive new values. If there are more variables in the template than words in the source string, the leftover variables receive null (empty) values. This is true for the entire parsing: for parsing into words with simple templates and for parsing with templates containing patterns. Here is an example of parsing into words:
/* More variables in template than (words in) the source string  */
satellite="moon"
parse var satellite Earth Mercury               /* Earth="moon"  */
/* Mercury=""    */
If there are more words in the source string than variables in the template, the last variable in the template receives all leftover data. Here is an example:
/* More (words in the) source string than variables in template  */
satellites="moon Io Europa Callisto..."
parse var satellites Earth Jupiter              /* Earth="moon"  */
/* Jupiter="Io Europa Callisto..."*/
Parsing into words removes leading and trailing whitespace characters from each word before it is assigned to a variable. The exception to this is the word or group of words assigned to the last variable. The last variable in a template receives leftover data, preserving extra leading and trailing whitespace characters. Here is an example:

Example 9.4. Parse var

/* Preserving extra blanks                                       */
solar5="Mercury Venus  Earth   Mars     Jupiter  "
parse var solar5 var1 var2 var3 var4
/* var1  ="Mercury"                                              */
/* var2  ="Venus"                                                */
/* var3  ="Earth"                                                */
/* var4  ="  Mars     Jupiter  "                                 */

In the source string, Earth has two leading blanks. Parsing removes both of them (the word-separator blank and the extra blank) before assigning var3="Earth". Mars has three leading blanks. Parsing removes one word-separator blank and keeps the other two leading blanks. It also keeps all five blanks between Mars and Jupiter and both trailing blanks after Jupiter.
Parsing removes no whitespace characters if the template contains only one variable. For example:
parse value "   Pluto   " with var1        /* var1="   Pluto   "*/

9.1.1. Message Term Assignments

In addition to assigning values to variables, the PARSE instruction also allows any message term value that can be used on the left side of an assignment instruction (See Section 1.13, “Assignments and Symbols”). For example:

Example 9.5. Parse var

/* Preserving extra blanks                                       */
solar5="Mercury Venus  Earth   Mars     Jupiter  "
d = .directory~new
parse var solar5 d~var1 d~var2 d~var3 d~var4
/* d~var1  ="Mercury"                                              */
/* d~var2  ="Venus"                                                */
/* d~var3  ="Earth"                                                */
/* d~var4  ="  Mars     Jupiter  "                                 */

9.1.2. The Period as a Placeholder

A period in a template is a placeholder. It is used instead of a variable name, but it receives no data. It is useful as a "dummy variable" in a list of variables or to collect unwanted information at the end of a string. And it saves the overhead of unneeded variables.
The period in the first example is a placeholder. Be sure to separate adjacent periods with whitespace; otherwise, an error results.

Example 9.6. Period placeholder

/* Period as a placeholder                                       */
stars="Arcturus Betelgeuse Sirius Rigil"
parse var stars . . brightest .            /* brightest="Sirius" */

/* Alternative to period as placeholder                          */
stars="Arcturus Betelgeuse Sirius Rigil"
parse var stars drop junk brightest rest   /* brightest="Sirius" */