History of the Reginald Rexx Interpreter

I never intended to be involved in coding a REXX interpreter. In fact, that's exactly what I had initially hoped to avoid. Let me give you the history of how I became involved with "Reginald".

I had several Win32 freeware programs (ie, written in C/C++) to which I wanted to add a "scripting language", so that endusers could customize the programs with new "features". Rather than spend my time and effort writing my own script language and "interpreter", I went off in search of some Win32 offering. Since this was freeware stuff, I didn't want to use any commercial offering, and therefore eliminated something like requiring endusers to purchase Visual Basic in order to write their scripts. Ideally, I wanted a script language that offered him "graphical elements" such as the ability to create windows with "controls" in them, and gather user input via a graphical interface. In this way, the enduser could not only add new features to the program, but also create a nice interface that hopefully would integrate seamlessly to the base features/interface offered by my program.

Searching through the net, I quickly discovered that this was going to be more difficult than I thought. Java was my first choice since that seems to be the "media darling" in interpreted languages, and does support a graphical interface. But, it didn't appear to me that Java could be used well as a scripting language (since it didn't appear to have any easy and efficient way of passing data back and forth between a non-Java app and a running JAVA "script" written by an enduser. At least, I couldn't find any good documentation how to do it). I also had to reject the current Active Scripting choices since MS charges a fee to programmers who want to include the Active Scripting engine in their programs.

Then, I happened to remember reading about some freeware REXX interpreter for Windows. I had previously been an OS/2 developer, and had even made several REXX function libraries as well as written REXX scripts and made executables that used the REXX SAA API. Even before that, I had done the same on the Amiga, where REXX was an often supported feature. So, I already knew what I needed to know in order to use REXX for my program's scripting language. Furthermore, I had already developed a "Rexx graphical interface" in the form of a SubCommand Handler for OS/2. The OS/2 API is very, very similiar to the Windows API, and I figured that it would not involve that much effort to port it to Windows. Plus, some of my freeware programs were former OS/2 programs that in fact used REXX and this graphical interface library as the basis of their macro support. A freeware REXX interpreter seemed like an ideal choice for me.

I found Regina beta 08.H, downloaded it, and began to experiment with it. Almost immediately, I ran into problems with it. The very first example C program I wrote to test Regina -- a simple program that launched a "Hello World" REXX script -- crashed the first time I ran it.

Here's an excerpt of the C code:

unsigned short rcode;

RXSTRING retstr;

char buffer[256];

long rc;

retstr.strptr = &buffer[0];

retstr.strlength = 256;

rc = RexxStart(0, 0, "text.rexx", 0, 0, RXCOMMAND, 0, &rcode, &retstr);

if (retstr.strlength && retstr.strptr && retstr.strptr != &buffer[0])

{

   free(retstr.strptr); /* This call crashes */

}

I was supplying my own "return buffer" to RexxStart(), and upon return, I discovered that Regina was allocating its own return buffer, even though the one I supplied was large enough. This was a violation of what RexxStart() is supposed to do. But even worse, I just couldn't free() that buffer, even though the Regina 08H docs stated that one should use free() to free up that buffer. (That is how RexxStart() is supposed to work -- the app is responsible for freeing any buffer returned by the interpreter). It crashed with an exception if I tried. So, I had an inherent memory leak in my app. I couldn't supply my own buffer, and I couldn't free the one that Regina returned. Now, I have never released a program that had a known memory leak in it. I didn't want to do so now. And I thought "This can't be right. I must be doing something wrong. Surely, everyone using the REXX SAA with Regina -- and there must have been several people doing so before me -- can't be writing programs that leak memory? Let me ask Mark Hessling (the current maintainer of Regina at that time) about this". Mark's answer was simply "It must be a bug in Regina". But he didn't know what it was, and admitted that he didn't know much about this part of the REXX interpreter. He said that perhaps someone with "a higher level understanding of the code" (to quote his email) would be better able to answer my question about this apparent bug. That seemed strange to me being that he was the one maintaining the code and releasing new versions.

But while I wanted this to be resolved, I put it aside for the time being, and proceeded to my next experiment with Regina. Since I was going to use REXX in a GUI (ie, "Windowed", in Win32 parlance) app, and a REXX script displays its output to a console window by default, I figured that an easy way to handle this would be to simply open a console window via AllocConsole() before calling RexxStart(). I presumed that Regina would be obtaining the handle to the output window in the way that any Win32 DLL should do so -- using the documented Windows API for this purpose -- GetStdHandle(). Nope. It didn't work. I quickly discovered that the Win32 version of Regina was not very Win32-aware and didn't use Win32 standards where it should in order to work with all Win32 apps. Again, I sent a question to Mark about this. His response was that he wasn't a Win32 programmer and didn't really know about such issues. He said that Regina relies upon the C library and C startup code for obtaining stdout, and if there is a better way of doing this in a Win32 DLL, perhaps someone who was more Win32-savvy could come up with some code for it. Again, I thought it odd that the person responsible for producing the Win32 version of Regina was not really a Win32 programmer. The Regina sources are "open", so it seemed like the best thing to do would be coordinate development among a group of developers, each one well-versed in at least one of the supported platforms, so that issues like these could hopefully be foreseen and resolved.

But again, I set this disturbing issue aside for now and proceeded to my third experiment. I was porting an OS/2 REXX function library to Win32. I made the Win32 version and then tried it out with Regina. It worked while the script was running, but when Regina went to close down after running the script, there were crashes. Sometimes, Regina actually shut down ok, but when I went to run the script again, it would crash. Clearly, something was wrong.

At this point, I decided that I should take a look at the Regina sources to see what was going on with the interpreter. After all, my first two questions to Mark had resulted in him suggesting that I should consult others for solutions. Why not myself?

I got the Regina sources, and that's when things really started to become clear to me. The very first thing that struck me was how little the code was documented. There were barely any comments in much of it, including any of the files relating to REXX SAA. No wonder why Mark wasn't that familiar with parts of the source code. Unless one spent months slogging through it, and documenting it, it wasn't readily apparent what the code was doing. Even then, I zeroed in on RexxStart() first, and quickly discovered the bug related to my first problem. It was one line of code toward the end of RexxStart() -- a simple miswriting of a test on the supplied buffer size. It was actually quite easy and simple to spot and fix. I did so the very first day that I got the sources. Yes, every program that used the REXX SAA API had been leaking memory with Regina, and no one had apparently ever noticed or felt that it was of much importance to have taken this minimal amount of time it took me to locate and fix the bug. That underscored to me the problems with Regina's continued development under only one maintainer. If that person never spotted the problem, or had other concerns preoccupying him, or didn't believe the problem to be "severe enough" to warrant his attention, or even had the philosophy that the code should undergo as few changes as necessary in order to simply make it "work", then that problem just wasn't going to get solved. In response to my inquiries/qualms about memory leaks in Regina, a Patrick McPhee once wrote on the Regina email list that "all programmers who use Regina write programs that leak memory" and that this was something that they should "just accept". This is precisely the situation that GNU open source tries to circumvent. The philosophy of GNU is that no one person knows what is best for everyone, nor can one given piece of software address everyone's needs. GNU's solution is to encourage many people to take some given code and modify it to suit their needs, and release the results to others in the hopes that a variety of needs can best be addressed through the diversity of different developer's approaches/concerns.

But, it seemed to me that the first thing that Regina needed was for its source code to be well-documented, so that many developers could more quickly understand and work with it. So, I decided to embark upon an effort to document the Regina sources. I knew that it would be a major undertaking, but I figured that eventually someone should do it so that all "levels" of the code would be understandable to everyone. As I began documenting the code, I started running across bugs in it. That was around the time that I started posting to the Regina mail list. I began posting bug fixes and I quickly discovered that those who had gotten comfortable with things "the way they are" (mostly because they had gotten comfortable being the "authorities" upon the code as-is, and didn't want this to be changed) didn't react well to this. For example, I posted a bug fix to the WORDPOS function. In fixing the bug, I was also able to eliminate redundant instructions in the function, thus streamlining the code and making it easier to maintain. Of course, I added comments to the code to fully document it. One of the "authorities" on the mail list, declared my bug fix to be "broken" without even testing his theory. Of course, the bug fix wasn't broken, and actual testing revealed that it didn't result in the broken behavior that he claimed it caused. When I pointed out the flaws in his theory, he declared that he wasn't interested in even looking at any bug fixes I submitted let alone testing the veracity of his statements about them (even though he clearly was interested in attempting to summarily dismiss them). A bug fix was made to Regina which supposedly "incorporated" what I had submitted, but in fact, it didn't. Not only were none of the improvements that I had made to the code used, and instead only the minimal amount of changes were made to the code -- my comments weren't even used, and the code remained totally devoid of comments. Mark later explained to me that his own personal philosophy is that Regina sources should undergo only minimal "incremental changes". Ok, that's fine if one needs only incremental changes to the existing code, but as I'll explain later, I needed something quite different. Mark then declared that bug fixes should not be posted to the mail list, and instituted a private email submissions of bugs, thus depriving programmers of any sort of discussion/communication/forum about bugs. It became very clear to me that, if there was ever going to be major changes/improvements to Regina to address my particular needs, I would have to be willing to create such a version from the sources myself. I then resolved to redirect my efforts toward working on my own version of Regina that did incorporate all of the coding improvements that I initially proposed in my very first posting to the Regina mail list.

Historical update: Some of the above things, such as that bug in RexxStart() have been fixed in more recent versions of Regina, but they were done well after several versions of Reginald had already been completed and released. So it was a moot point by then. Reginald had already gotten off the ground and had its own unique featureset (ie, features that were and are not available in Regina) that I wanted to develop further.

The next thing that I noticed was that Regina had a very big reliance upon the C library. It hadn't been made portable by isolating OS specific stuff in a well-organized way (ie, for example, putting all the OS specific stuff in one header file and one source code file -- as I have now done with Reginald), and creating a design that allowed for things to be done properly upon each OS. I cited the GetStdHandle thing above. But as another example, I discovered why my function library was causing crashes. Regina doesn't properly unload function libraries under Win32. It doesn't have a corresponding FreeLibrary() call for its LoadLibrary(). (OS/2 also needs to have libraries unloaded. So does the Amiga. Probably some of the other supported OS's need that too, but I don't know them). I was doing some things in my function library's load code, and which needed to be cleaned up in its unload code -- the latter never being called properly due to Regina's lack of FreeLibrary(), a bug that still exists in Regina 2.X but which has been fixed in Reginald. After I also reported to Mark that Regina doesn't free some memory associated with function libraries, I noticed that he added an attempt to fix this bug in the final version of 08.H. He added a function purge_library to try to do this. But it still doesn't unload libraries, and resources still aren't freed for apps that use RexxStart(), obviously because there were problems related to attempting to free up these resources. Because of Regina's particular approach toward error handling and script termination, which I'll discuss in a bit, memory cleanup becomes very precarious under Regina. This is an example of exactly why these memory leaks will not be solved via "incremental changes" and can only be done by making major design changes to the interpreter (as I have already done with Reginald). So, Regina achieves portability by its reliance upon the C library. But this in itself introduces problems, as it did in not allowing AllocConsole() to work with Win32 Regina -- the C library used to create Win32 Regina obviously doesn't use GetStdHandle() to obtain stdout. So now you have C library issues to deal with which is sort of ironic when you're trying to write code that eliminates discrepancies among different systems. And it really does seem odd to me to be creating the interpreter for one language such that it is dependent upon the compiler library of an entirely different language. In fact, requiring that an app be linked with C startup code in order to provide stdin, stdout, and stderr really pulls the rug out from under any developer who wants to use the REXX SAA API but isn't writing a program in C or C++. Regina's reliance upon the C library has been done in such a way that it even prevents programmers from using any language except C/C++ to interface to REXX. (This is no longer a limitation in Reginald. Reginald supports writing in any compilable language to its REXX SAA API, and indeed, even includes support that makes it easier to do so in C++).

But by far the most insidious use of the C library is in setjmp()/longjmp(). Regina not only uses these to "jump" to error handling code, it also uses these to "return" when a script EXIT/RETURNs. The net result is that when an error occurs (including any NOVALUE or NOTREADY condition handled by SIGNAL in a REXX script, or any SYNTAX, ERROR, FAILURE, or HALT condition raised), or when a script EXIT/RETURNs, Regina abruptly jumps out of whatever function it is within. The net result is that the function never gets to clean up any resources it has allocated. And since Regina has no memory tracking, its error handling can't know what it can or can't free. And that's why it's so precarious for Regina to try to free up resources later on. It can't be sure what is still hanging around because it was "abandoned" or what is still hanging around because it is needed (such as the "leaves" of a stored REXX macro). So, Regina simply foregoes cleaning up its memory at all, and leaves that up to the C library's exit code to do. (ie, Regina uses the C library's malloc() to allocate memory. Well, most of the time it uses malloc. Sometimes it uses OS specific functions, but that's just yet another wrinkle, and possible obstacle, to resolving Regina's memory leaks. Therefore the C library's exit code is left to clean up resources in conjunction with whatever bookkeeping malloc does). The problem is that the C library exit code doesn't get called until a REXX script (and the host which launched it) finally ends. So all the while a script is running, it is constantly leaking more and more memory that doesn't get returned to the system until the script stops running. (But wait, the story is even worse for a program that uses the REXX SAA API. I'll get to that in a bit).

But the use of setjmp()/longjmp() isn't the only problem with memory leaks. It aggravates the problems, but there are other aspects of Regina's design which also do so. One aspect is the fact that Regina does a lot of "reformatting" of arguments passed between its functions, and does memory allocations for this. And some functions allocate memory transparently (ie, the caller never knows that memory has been allocated), and then the function returns this memory to the caller. Since the caller never knows that it has been allocated, the caller never frees it. Hence, a memory leak. For example, when an external function is called, Regina formats some args in the REXX script into a structure called a streng. It passes them to an intermediate function, do_an_external(), which then allocates more memory to reformat those args into two arrays -- one that holds pointers to the args, and one that holds the length of each arg. Then, a low level function named IfcExecFunc() is called which then allocates more memory and reformats the args a third time into standard RXSTRING structures. So what is the point of transparently allocating that extra memory to do that intermediate reformatting in do_an_external()? Why not just rewrite IfcExecFunc() to accept the streng structs and format them directly into an RXSTRING? Someone on the Regina email list revealed a bit of history that was completely unknown to me (and how could it be otherwise since the Regina sources are not documented concerning this) that Regina was originally designed to support some extra apps (that have since been abandoned) which directly called IfcExecFunc() with args formatted in this third way. So, even though these extra, unmanaged memory allocations and intermediary functions were no longer needed, they were left in Regina. I suspected that there were similiar stories behind why Regina wasted memory and speed upon maintaining 2 separate lists of external functions in extlib.c as well as in rexxsaa.c, and what was the reason for what appeared to be various, unused variables and unused fields of various structures.

Now we get to what I call "The assure() design flaw". Look in the file string.c at the function assure(). The purpose of this function is to make sure that a streng is of at least a minimum size. If smaller, then a new streng is allocated of the required size, and the contents of the old streng are copied to the new streng. Then the new streng is returned. So what happens to the old streng? Aha. That call to Free_string() is commented out. The old streng is never freed. It becomes leaked memory. Now look at who calls assure(). A large number of the other functions in string.c. And who calls all of those functions? A very large number of other functions in the interpreter do. There are literally hundreds of calls that wind their way through assure(), and each one has the potential to leak memory. Why is that call to Free_string() commented out?? Well, that's because the many various callers of assure() don't bother doing error checking, so it's safer to leave the old streng laying around in case someone inadvertently references it. And why should those callers bother with error checking? After all, if an error arises, or the script does an EXIT/RETURN, then longjmp() is inevitably going to be called and you've got leaked memory that can't be recovered anyway without a much smarter memory manager than the one Regina currently has.

I concluded that Regina's error handling and memory management needed a major redesign. In talking with several other people who reported severe problems due to leaked memory in Regina, it confirmed my suspicion that not everyone's needs were being addressed by one version of these GNU sources.

But let's talk about the REXX SAA API. The memory leaks are particularly drastic for a host program that hangs around for awhile and makes repeated calls to RexxStart(). (ie, It's a worse situation than a REXX script that gets launched by Regina.exe). Each time it calls RexxStart(), all of those memory leaks accumulate. Yes, when the host program ends, they'll be cleaned up by the C library's exit code. But up until then it will keep leaking memory faster than does a single REXX script launched with Regina.exe, because the memory leaks of the first script don't get freed even after that script is done running. They get freed only when the host program itself ends. And since my primary interest in using Regina is for the REXX SAA API, I found Regina's design flaws to be a major hurdle.

I started to make a list of all of the goals that I thought should be done. This included all of the goals in my initial posting to the Regina mail list, but also included numerous other goals I set after becoming a little more familiar with the Regina sources.

1) Create a nice, intuitive install program for the interpreter so that any enduser can easily install the package (without having to unzip files, or manually copy files to important system directories, or run special batch files to create file associations). The install program should properly setup the file association so that a user can easily run REXX scripts right after installation, and create desktop shortcuts, and also come with an uninstall utility to cleanly remove the interpreter. It should follow Microsoft's guidelines for the installation of a "system wide shared library". It should do proper version checking to prevent endusers from inadvertently installing older versions over newer versions.

2) The frontend for the interpreter should be a lot more user friendly, allowing a user to pick out a script via a File Dialog, and better support associations with filenames that have embedded spaces, and handle error messages better (for example keeping the window open to display an error message to the user), and even make it possible for REXX scripts to "autorun" off of a CDROM.

3) The sources should be explicitly commented in an organized way. Each source file should begin with a detailed description of what is the purpose of the functions in it. And every single function should be preceded by a standard "comment header" that includes a description of what the function does, lists each of the args it takes, describes what the function returns, and includes any additional notes that are important concerning use of the function. Additionally, the code in the function itself should be profusely commented so that someone can merely read the comments and come up with a flowchart of the function. These comment headers should be formatted so that they can be easily searched for, and programatically extracted to create documentation. It is imperative that the code be documented so that anyone can work with the sources, without spending many months just slogging through the code to try to deduce what it all does.

4) setjmp()/longjmp() must be eliminated, and proper error handling should be added to eliminate all memory leaks/failures associated with the removal of setjmp/longjmp. This requires *major* overhaul of the interpreter. Some functions need to be rewritten to return error codes. Some functions need to be rewritten to check for errors and handle them sanely. All functions must either clean up after themselves, or make sure that their callers are informed of any needed cleanup and eventually do that cleanup.

5) The reliance upon the C library should be eliminated, especially to the extent that it requires a host to be linked with C startup code and some sort of C library. (For those platforms where it's not really that important, some reliance upon the C library can be left. But it must be done in a way that allows programmers to easily eliminate the C library for any supported platforms). But the setjmp()/longjmp() design has to go for all platforms.

6) Where possible, OS specific features should be supported in order to improve performance and feature sets. For example, Win32 allows the sharing of read-only memory between processes. That eliminates the unnecessary, wasteful duplication of many strings and variables in Regina. But this should be done with goal #7 in mind.

7) As much as possible, all platform specific code should be confined to only two files, the header "wrapper.h" and the source file "wrapper.c". In this way, porting to other platforms becomes less of a nightmare since only two files should need to be edited. These two files should be explicitly, and carefully, documented so that someone interested in porting the interpreter can know exactly what code he needs to write. Except for these two files, he shouldn't have to mess with the other source files.

8) The obsolete, intermediate functions should be eliminated by rewriting the lowest level functions to better interface with the higher level functions. Transparent memory allocations should either be eliminated, or their management must be made more "sane" by having the interpreter at least clean up those allocations when a script ends -- even a script launched by RexxStart(). The entire Ifc* API should go. The source files should be streamlined and reorganized better now that the need to support extra, obsolete, proprietary apps is gone.

9) All other nagging memory and resource leaking should be found and fixed. This includes everything from properly unloading libraries, to chasing down anything else that is simply not being freed by the interpreter itself (but rather by the C library's exit code). The assure() design flaw *must* be fixed. Redundancies should be traced down and removed, for example, the unnecessary need for two separate lists of external functions.

10) Make various speed and memory usage improvements. A lot of the Regina code can be rewritten to do this. One big improvement I made is to introduce what I call "preset RXSTRs". Many functions that previously allocated memory, no longer need to do so, by using these preset RXSTRs, thus making them faster and more memory efficient.

11) Complete missing features of the interpreter, such as implementing RexxRegisterSubcomDll() and RexxRegisterExitDll(). (ie, Implement Exit Handlers and Subcommand Handlers in dynamic link libraries).

12) Add some new features to the interpreter, such as "auto-loading" of external function libraries, and the ability to use the Windows registry for making "Global interpreter settings" that the enduser may wish to set (instead of the now obsolete and problemsome use of Environment blocks under Win32, or needing the interpreter code to be recompiled with redefined "settings"). Allow scripts to "load" Environments on their own, as well as allow the user to "mount" Exit handlers that work transparently in conjunction with any script and host program that uses the interpreter. Allow scripts to tap into the macro table, so that they don't have to always load a script from disk when calling it as a subroutine.

13) Clean up inconsistent error handling in the REXXSAA API, and potentially dangerous error handling that itself may be susceptible to failing with errors. (ie, Regina's exiterror() has the potential to get sucked into endless recursion until it blows up the stack).

14) The parser/scanner should be rewritten to eliminate flex/yacc. flex/yacc results not only in a rather generic (read: large and inefficient) parser, but also requires a reliance upon some extra, sort-of-archaic programming tools. It produces highly unreadable code and therefore obscures important design details from people wishing to work with the sources. That's one big reason why the parser code is still not really reentrant for Regina 2.0 and can present problems even in the "thread-safe" 2.0.

15) Some memory "coalescing" feature should be added to the interpreter so that runtime memory allocations can be minimized, and "garbage collection" can be performed.

16) Make a thread-safe version of the DLL, and use a technique that will work with all supported, multi-threading platforms.

So far, I have achieved the first 13 goals in the first release of Reginald. Goal #14 will be a *major* piece of rewriting. Although I have done quite a bit of work already on the scanner/parser, I have yet to totally rewrite it as I would like to do ultimately. Goals #15 and #16 will likely be somewhat tricky too.

Nevertheless, I have already expended a lot of time and effort into reworking the Regina sources, and today have a version up and running that meets the majority of my above goals.

And that's my history of getting involved with Reginald, and where I am today.

Jeff Glatt