3.12 Iterators and the FOREACH Loop

3.12 Iterators and the FOREACH Loop

One nifty feature HLA provides is support for true iterators¹. An iterator is a special type of procedure or function that you use in conjunction with the HLA FOREACH..ENDFOR loop. Combined, these two language features (iterators and the FOREACH..ENDFOR loop) provide a very powerful user-defined looping construct.

The HLA FOREACH..ENDFOR statement uses the following basic syntax:
	foreach iteratorID( optional_parameters ) do
 

 
		<< loop body >>
 

 
	endfor;
 

 
The FOREACH statement calls the specified iterator. If the iterator succeeds, then the FOREACH statement executes the loop body; if the iterator fails, then control transfers to the first statement following the ENDFOR clause. On each iteration of the loop body, the program re-enters the iterator code and, once again, the iterator returns success or failure to determine whether to repeat the loop body.

At first glance, you might get the impression that the FOREACH loop is nothing more than a WHILE loop and an iterator is a function that returns true (success) or false (failure). However, this is not an accurate picture of how the FOREACH loop operates. First of all, the FOREACH loop does not CALL the iterator on each iteration of the loop; it re-enters the iterator. Specifically, control does not (necessarily) begin with the first statement of the iterator whenever control returns to the top of the FOREACH loop. The second big difference between a FOREACH/iterator loop and a WHILE/function loop is that the iterator procedure maintains its activation record in memory for the duration of the FOREACH loop. A function you would call from a WHILE loop, by contrast, builds and destroys the function's activation record on each iteration of the loop. This means that the iterator's local (automatic) variables maintain their values until the FOREACH loop terminates. This has important ramifications, especially for recursive iterator functions.

An iterator declaration looks very similar to a procedure declaration. Indeed, about the only syntactical difference is the use of the reserved word ITERATOR rather than PROCEDURE. The following is an example of a simple iterator:
iterator range( start:uns32;  last:uns32 ); nodisplay;
 
begin range;
 

 
	mov( start, eax );
 
	while( eax <= last ) do
 

 
		push( eax );
 
		yield();
 
		pop( eax );
 
		inc( eax );
 

 
	endwhile;
 

 
end range;
 

 
The only thing special about this iterator declaration, other than the use of the ITERATOR reserved word, is that it calls a special procedure named yield. In a few paragraphs you'll see the purpose of the call to the yield procedure.

A typical FOREACH loop that calls the range iterator might look like the following:
	foreach range( 1, 10 ) do
 

 
		stdout.put( "Iteration = ", (type uns32 eax), nl );
 

 
	endfor;
 

 
Here's how the iterator and the FOREACH loop work together. Upon first encountering the FOREACH statement, the program makes an initial call to the range iterator. Except for a few extra parameters HLA pushes on the stack, this call is exactly like a standard procedure call. Upon entry into the iterator, the start parameter has the initial value one and the last parameter has the initial value ten. The iterator loads start into EAX and compares this against the value in last (ten). Since EAX's value is less than or equal to ten, the program enters the loop's body. The loop body pushes EAX's value onto the stack and then calls the yield procedure. The yield procedure transfers control to the body of the FOREACH loop that called the range iterator in the first place. Calling yield is how the iterator returns success to the FOREACH loop. Within the body of the FOREACH loop, above, the code prints out the value of the EAX register as an unsigned integer. During the first iteration of the loop, EAX contains one so the loop body prints this value.

At the bottom of the FOREACH loop, the program re-enters the iterator. When the FOREACH loop re-enters the iterator, it transfers control to the first statement following the call to the yield function. Intuitively, you can view the FOREACH loop body as a procedure that the iterator calls whenever you call the yield function². Whenever the program encounters the ENDFOR clause, it returns to the iterator, executing the first statement beyond the yield call. In the current example, this pops the value of EAX off the stack (preserved before the call to yield), the loop increments EAX and repeats as long as EAX is less than ten.

When the range iterator increments EAX to 11, the WHILE loop in the iterator terminates and control falls off the bottom of the iterator. This is how an iterator returns failure to the calling FOREACH loop. At that point control transfers to the first statement following the ENDFOR in the FOREACH..ENDFOR loop.

By the way, the range iterator, combined with the FOREACH loop above, creates a relatively inefficient implementation of the following loop:
	for( mov( 1, eax ); eax < 10; inc( eax )) do
 

 
		stdout.put( "Iteration = ", (type uns32 eax), nl );
 

 
	endfor;
 

 
However, don't get the impression from this example that iterators are particularly inefficient. Iterators are not a good choice for something like range. However, there are many iterators you can write that are just as efficient as other means of loop control and computation.

An important point to remember when using iterators is that the iterator's activation record remains on the stack as long as the iterator returns success. The program only removes the activation record when the iterator fails. The range iterator takes advantage of this fact since it refers to the value of its last parameter on each re-entry from the FOREACH loop. The fact that parameters and local (automatic) variables maintain their values for the duration of the FOREACH loop is very important to many algorithms that use iterators, especially recursive algorithms.

One side effect of having an iterator maintain its activation record until it fails is that the value of ESP changes considerably between the statement immediately before the FOREACH statement and the first statement in the body of the FOREACH loop. This is because the program "pushes" the activation record onto the stack upon encountering the FOREACH loop and doesn't "pop" this activation record off the stack until the FOREACH loop fails. Therefore, code like the following will not work as expected:
	pushd( 10 );
 
	foreach range( 1, 25 ) do
 
		pop( ebx );
 
		push( ebx );
 
		stdout.put( "eax=", eax, " ebx=", ebx, nl );
 
	endfor;
 
	pop( ebx );
 

 
The problem with this code is that the FOREACH loop pushes a whole lot of data onto the stack after the PUSHD instruction pushes the value 10 onto the stack. Therefore, the POP instruction inside the loop does not pop the value 10 from the stack. Instead, it pops some data pushed on the stack by the iterator (specifically, it pops the return address that transfers control to the first instruction following the yield call). Therefore, you cannot use the stack to transfer data into or out of a FOREACH loop³.

Another problem with the stack and the FOREACH loop occurs if you try to prematurely exit a FOREACH loop before the iterator returns failure. Whenever an iterator fails, it cleans up the stack and restores ESP to the value it had upon encountering the FOREACH statement. However, statements like BREAK, BREAKIF, EXIT, EXITIF, JMP and any other flow of control transfer instructions will not clean up the stack if they transfer control out of a FOREACH loop. For example, the following code will leave the activation record for the range iterator sitting on the stack:
	foreach range( 2, 5 ) do
 

 
		jmp ExitFor;
 

 
	endfor;
 
	ExitFor:
 

 
Depending on the iterator and the code that calls the iterator, prematurely exiting a FOREACH loop without having the iterator return failure and leaving this junk sitting on the stack may have an adverse effect on the operation of your program. Clearly if you've pushed data onto the stack prior to the FOREACH loop, you will not be able to pop that data off unless you manually clean up the stack yourself (this involves saving the value of ESP prior to the FOREACH statement and restoring this value at the ExitFor label, above). Also, don't forget that prematurely exiting a FOREACH loop without letting the iterator finish may wind up grabbing some system resources that the iterator would normally free just before returning failure (e.g., calling free and closing files).

The volume on Advanced Procedures will go into the details concerning the low-level implementation of iterators. Until then, keep in mind that iterators build their activation records differently than standard procedures. Until you read that chapter, you should not attempt to call an iterator directly (i.e., outside a FOREACH loop) nor should you use the "noframe" option with an iterator. See the chapter on Advanced Procedures for more details on the implementation of iterators.

¹HLA's iterators are based on the control structure by the same name from the CLU programming language. Those things that C/C++ programmers refer to as iterators are more properly called cursors. While it is certainly possible to write cursors in HLA, it is important to note that HLA's iterators are quite a bit more powerful than C/C++'s iterators.

²In fact, this is exactly how HLA implements iterators and the FOREACH loop. See the volume on Advanced Procedures for more details.

³Not that it's a good idea to transfer data into or out of any loop using the stack. Such code tends to have lots of errors due to extra pushes or pops appearing in the program.

3.12 Iterators and the FOREACH Loop

Web Site Hits Since Jan 1, 2000

Web Site Hits Since
Jan 1, 2000