29 December 2015

Regina Rexx is a very complete and well-documented implementation of the Rexx programming language. Unfortunately its documentation is very much reference documentation, a problem that makes using rarer facilities more difficult.

One such facility is the whole foreign language interface. Regina follows IBM’s SAA API, but this is an API that is not particularly popular outside of IBM’s notoriously inward-looking world. It is definitively not an API that is well-endowed with end-user documentation, especially in its Rexx integration side of things.

There are many examples of Rexx libraries which expose facilities to Rexx users that can be looked at, but unfortunately almost all of them rely on compatibility shims that make it very hard to piece together what’s actually going on when you’re learning. Further, the compatibility shims are a layer of unnecessary obfuscation if you, like me, only ever really intend to use Regina Rexx in your code.

Because of these deficiencies in accessible documentation I am embarking on a small series of blog entries to ease entry into the Rexx extension field. Today’s outing will introduce simply adding a single external function to Rexx’s capabilities.

The example

For this baby step into extending Rexx we will be exposing a single, simple function: ExternalMultiply(). ExternalMultiply() will accept any number of integer numbers (only in the range of what will fit into a C long) and multiply them together, returning the product (where the product will fit into a C long long). For example the following line will print the number '720':

say ExternalMultiply(2, 3, 4, 5, 6)

Complete source for the example, a test driver, and a build script are provided at the end of this blog entry.

Note

I’m a Linux user and don’t really have a lot of interest in programming under Windows. The code I have written should work under Windows (famous last words) but there will definitely need to be some changes to the build script. There may also be minor tweaks needed here or there in both the C implementation and the Rexx driver. If you use Windows you’ll have to figure these out for yourself.

Concepts

There are some concepts you’ll have to get used to first before extending Rexx.

It’s strings all the way down

Everything in Rexx is a string. Behind the scenes things may not be so simple-mindedly implemented, but at any public level of visibility Rexx values are all strings. This implies that all of your functions will have a set of strings as parameters.

C? What’s that?

Rexx is not based on the assumption that C is the lingua franca of computing. Many languages have assumptions that are thinly veiled C assumptions. Numbers are based on C numberical types (usually long for integers and double for floating point, for example). Strings are NUL-terminated arrays of byte-sized characters. Rexx is not based on this assumption since it, as a language, predates the C-as-lingua-franca era. As a result you will be needing to understand Rexx’s exposed data types and you will spend a lot of time converting back and forth between them and C’s.

First steps

There is a lot of boilerplate in making FFI for most languages. Rexx is no exception. Here are some of the things you’ll have to do in most function-exposing modules.

rexxsaa.h

All of the Regina API is specified in a single included header: rexxsaa.h. Unusually for such a system, it is not enough to merely include the header. You have to activate specific subsystems before including it. This is done by using #define of relevant symbols before inclusion:

#define INCL_RXFUNC     1
#include <rexxsaa.h>
1 Here we activate the external function subsystem. INCL_RXSHV would be used instead (or in addition) to include access to Rexx’s variable pool.

When you include rexxsaa.h with the relevant symbols defined you are given access to the prototypes, data types, and symbols for the subsystem interface you wish to use. In our case we are given access to the prototypes, data types, and symbols for the external function interface API.

Declarations

Of course you’ll have to declare the function that’s being exported. The plain version of this looks something like:

APIRET APIENTRY ExternalMultiply(PCSZ, ULONG, PRXSTRING, PCSZ, PRXSTRING);

That’s quite a mouthful, however, and would get tedious to type for each and every exported function. Thankfully the Regina API gives you a nice typedef for it:

RexxFunctionHandler ExternalMultiply;

If you must you can go ahead and use the repetitive, long-winded version but I really strongly recommend using RexxFunctionHandler instead.

Of course the types involved have to be known. APIENTRY is something you place there "just because". (It has to do with linkage types on OS/2 and Windows. Putting it in the signature makes sure that your code will work on OS/2 and Windows environments.) APIRET is an alias for ULONG. ULONG is an unsigned long alias. PCSZ is a typedef for a pointer to a C-style (NUL-terminated) string. PRXSTRING is a pointer to an RXSTRING.

RXSTRING is the kicker. That’s the representation Regina exposes for its internal string values. It is defined as:

typedef struct {
  unsigned char *strptr;    1
  unsigned long strlength;  2
} RXSTRING;
1 Points to the string contents: an array of unsigned bytes.
2 Contains the length of the string contents.
Important strptr can contain any 8-bit values. Including NUL. It is emphatically not a C-style string! strlength contains the length of the content pointed at by strptr, not the size of the buffer. This will be an issue later.
Tip

There are a number of helper macros provided to help work with RXSTRINGS. They are briefly glanced over here; consult the Regina documentation for full details.

MAKERXSTRING()

Builds an RXSTRING from constituent parts.

RXNULLSTRING()

Tests if an RXSTRING is a 'null string' (zero-length string).

RXSTRLEN()

Returns the length of an RXSTRING.

RXSTRPTR()

Returns the content pointer of an RXSTRING.

RXVALIDSTRING()

Returns true if a string has contents; is neither NULL nor zero-length.

RXZEROLENGTHSTRING()

Returns true if X is a zero-length (but not NULL) string.

In addition to the above, there are also two values which need defining (for readability):

#define RX_OK     0
#define RX_ERROR  1

The Regina APIs all want a return value of 0 for "worked fine" and a return value that is non-zero for "failed somehow". Note that this is not the return value that the function itself returns to the script! This is the return value the function returns to the interpreter to tell it whether the function call was a success or not. When you return non-zero, Rexx’s conditions mechanisms leap into action, signalling or calling error handlers as appropriate.

Local declarations

Of course in any non-trivial function package you’ll have to declare helper functions and other such things. This is a trivial function package, but for show here are two helpers:

static long rexx_to_long(RXSTRING);                   1
static void long_long_to_rexx(long long, PRXSTRING);  2
1 Converts an RXSTRING into a long.
2 Converts a long long into an RXSTRING.
Note

These two helpers are probably overkill for a module as trivial as ours, but it shows one thing: you will need a bunch of conversion functions whenever making a Rexx extension. (Indeed in a serious module you’ll probably want to build a library of them at need, specifically for bringing into other projects as you make them. Or you can just steal one of the ones from any of the existing foreign function extensions already provided.)

Implementation

Now that we’ve declared everything of interest, we need to implement the functionality. Let’s start by looking at the helper functions.

static long rexx_to_long(RXSTRING rexxval)
{
  return strtol(RXSTRPTR(rexxval), NULL, 10);       1
}

static void long_long_to_rexx(long long val, PRXSTRING rexxval)
{
  sprintf(RXSTRPTR(*rexxval), "%lld", val);         2
  rexxval->strlength = strlen(RXSTRPTR(*rexxval));
}
1 FLAW! Better code would explicitly point to the end of the string!
2 FLAW! Better code would allocate a local buffer instead of using whatever was passed to it!

The first thing that the observant reader will spot is that this code is not very safe! This is because it is trying to illustrate the concepts of the API without burying them underneath a pile of security boilerplate. Serious code would make use of proper techniques including properly framing the strtol() call with an end pointer, checking for errors, and generally not being a one-liner. (This is why I recommended building up a conversion library earlier; there’s a lot of potential boilerplate overhead that you’re not going to want to type repeatedly.)

That being said, Regina in particular will pass, for numbers, C-style strings in the strptr member, so use of C-style string manipulation functions is fine for demonstration purposes.

The second flaw is a bit more reasonable. In the context of Regina this is not a flaw at all. Regina allocates a 256-character string for return values. This is documented and fixed. What is at issue is if all interpreters do this. If portability across interpreters is not your concern, then using the presupplied buffer is fine.

Of course it goes without saying (but I will say it anyway) that you’ll need to include the appropriate C library headers for any of this to work:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

And now the exported function

APIRET APIENTRY ExternalMultiply(PCSZ name, ULONG argc, PRXSTRING argv,
                                 PCSZ queuename, PRXSTRING returnstring)
{
  long long product = 1;
  long i;

  for (i = 0; i < argc; i++)
  {
    product *= rexx_to_long(argv[i]);
  }
  long_long_to_rexx(product, returnstring);

  return RX_OK;
}

This is the function that Rexx scripts will actually directly call. It is actually quite straightforward, but has some unexpected issues. Let’s look at the parameters passed in one at a time:

PCSZ name

This is a C-style string containing the name of the function being called. Why is this needed? Because it’s possible to have a single registered function entry point that implements several related functions. (This is not something I’d personally recommend, but it’s something you can do!) We ignore this in our code.

ULONG argc

Your function will have argc `RXSTRING`s as arguments.

PRXSTRING argv

An array of argc `RXSTRING`s. Your arguments, in short.

PCSZ queuename

A C-style string containing the name of the current data queue. This is out of scope for this tutorial (and out of scope for most code!).

PRXSTRING returnstring

This points to a single RXSTRING which is used to return a value to the caller. The special variable RESULT will be set to this value on return. By default Regina supplies a ready-made 256-byte buffer in this pointer that you can use to set up your return value. This may not be portable.

The rest of the code is straightforward. product is initialized to 1. Each passed-in argument is converted into a long and multiplied in place with product. When the arguments have all been processed, product is converted into the RXSTRING pointed at by returnstring. RX_OK (0) is returned then to signal that everything is hunky dory.

The Rexx side

On the Rexx side, the interpreter must first be informed of the existence of the exported function and its location. This is done in this chunk of code:

if RxFuncAdd('ExternalMultiply', 'external', 'ExternalMultiply') <> 0 then
  do
    say RxFuncErrMsg()
    exit E_ERROR
  end

The key code is the call to RxFuncAdd(). It maps the first argument (internal name of the function) to the third argument (external name of the function) in the library named by the second argument.

In this case we’re calling the function ExternalMultiply in Rexx, although it will be callable as externalMultiply or even ExTeRnAlMuLtIpLy—Rexx is case insensitive).

The external name is ExternalMultiply, the name you exported the function by.

The library name is 'external' which, in a Linux environment will have 'lib' prepended and '.so' appended, so in this case it will be looking for 'libexternal.so'. Again this will change by enviroment.

Tip

Rexx itself is a very portable language, but it is quite natural that when interfacing with the outside world through an FFI there will be platform differences. It is strongly advised that you be familiar with all of your target platforms' development tools and their quirks if making code for multiple platforms.

Calling the function

After all of this, calling the function is an anticlimax. ExternalMultiply is now used just like any Rexx BIF:

say ExternalMultiply(2, 3, 4, 5, 6)
product = 1
do i = 100 to 1000 by 145
  product = ExternalMultiply(product, i)
end
say product

Of course there will be some issues relating to C type limitations. Rexx has arbitrary-precision arithmetic that doesn’t wrap. Most C implementations will have 64-bit long long values that will wrap when overflowed. This particular code will, as a result, not be seamless.

Rexx-ifying the code

Of course this function is trivial, not particularly well-matched to Rexx, and not very safe. Using it will not give the programmer the feeling that they’re using something intended for Rexx. Here are some improvements that could be made.

Safety first!

Use secure code. The endptr argument to strtol() should be used instead of assuming that the number passed by Regina will be NUL-terminated. Allocate a local buffer for returnstring and use that instead of the Regina-provided one. (Don’t worry: you won’t leak. If you change the returnstring->strptr member, Regina will deallocate it for you when finished using it.)

Rexx numbers are different

Rexx numbers are arbitrary precision decimal representations. (Indeed they are the inspiration and much of the design behind the more recent IEEE 754 decimal format!) They are not like C’s float or double types and they are not like C’s integer forms, unsigned or otherwise. Using something like this decimal representation package instead of C’s native types is probably smart idea.

(Actually ISO/IEC TS 18661-2 enhances ISO C with decimal floating point support. Catch up!)

Check, check, and check again

The implementation of ExternalMultiply() doesn’t check any of the input for validity. Nor do its helper functions. There’s no check for 0 values, so no short-circuit return of 0 at need. There’s no check that the answer will overflow the returnstring buffer (although with a long long that is not a meaningful risk).

Proper code will check all of this. Do that when writing real code.

Complete source

The following blocks contain the full source code for the external function implementation, the test driver, as well as a simple build script usable in a Linux environment. They should serve as a good basis for making a proper, useful Rexx extension library.

Simple example
/* external.c */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define INCL_RXFUNC
#include <rexxsaa.h>

/* helper function declarations */
static long rexx_to_long(RXSTRING);
static void long_long_to_rexx(long long, PRXSTRING);

/* external API function declarations */
RexxFunctionHandler ExternalMultiply;

/* symbolic return values */
#define RX_OK     0
#define RX_ERROR  1

/* external API functions */
APIRET APIENTRY ExternalMultiply(PCSZ name, ULONG argc, PRXSTRING argv,
                                 PCSZ queuename, PRXSTRING returnstring)
{
  long long product = 1;
  long i;

  for (i = 0; i < argc; i++)
  {
    product *= rexx_to_long(argv[i]);
  }
  long_long_to_rexx(product, returnstring);

  return RX_OK;
}

/* helper functions */

static long rexx_to_long(RXSTRING rexxval)
{
  return strtol(RXSTRPTR(rexxval), NULL, 10);
}

static void long_long_to_rexx(long long val, PRXSTRING rexxval)
{
  sprintf(RXSTRPTR(*rexxval), "%lld", val);
  rexxval->strlength = strlen(RXSTRPTR(*rexxval));
}
Test driver
/* test-external.rx */
E_OK         = 0
E_SYNTAX     = 1
E_ERROR      = 2
E_FAILURE    = 3
E_HALT       = 4
E_NOTREADY   = 5
E_NOVALUE    = 6
E_LOSTDIGITS = 7
E_UNKNOWN    = 255

signal on syntax      name  error
signal on error       name  error
signal on failure     name  error
signal on halt        name  error
signal on notready    name  error
signal on novalue     name  error
signal on lostdigits  name  error

if RxFuncAdd('ExternalMultiply', 'external', 'ExternalMultiply') <> 0 then
  do
    say RxFuncErrMsg()
    exit E_ERROR
  end

say 'ExternalMultiply(2, 3, 4, 5, 6) returned' ExternalMultiply(2, 3, 4, 5, 6)

exit E_OK

error:
  type = condition('C')
  if condition('I') = 'SIGNAL' then
    say 'Error' type || '(' || rc || ') signalled on line' sigl || '.'
  else
    say 'Error' type || '(' || rc || ') called on line' sigl || '.'
  say 'Description:' condition('D')

  select
    when type = 'SYNTAX' then
      code = E_SYNTAX
    when type = 'ERROR' then
      code = E_ERROR
    when type = 'FAILURE' then
      code = E_FAILURE
    when type = 'HALT' then
      code = E_HALT
    when type = 'NOTREADY' then
      code = E_NOTREADY
    when type = 'NOVALUE' then
      code = E_NOVALUE
    when type = 'LOSTDIGITS' then
      code = E_LOSTDIGITS
    otherwise
      code = E_UNKNOWN
  end

  exit code
Build script
/* build-external.rx */
'gcc -shared -fpic -o libexternal.so external.c'