Less error-prone PROTECT in the R API

Radford Neal
 

Dear R API people,

The new PROTECT scheme that I proposed in a previous email is now
implemented in the version of pqR released a few days ago. So if you
like, you can try it out. You can get pqR from pqR-project.org. The
documentation on the new facility is found in this section of pqR's
R-exts.html: http://www.pqr-project.org/R-exts.html#Garbage-Collection

I think the new scheme is substantially less error prone in
complicated situations. In situations where just a few things need
protecting in an obvious manner, using PROTECT and UNPROTECT will be
be similarly simple, and slightly faster, but the new scheme would
also work fine, and could be recommended in all situations. PROTECT
and UNPROTECT are of course still present, for backwards compatibility
if nothing else.


Another way of reducing PROTECT/UNPROTECT bugs has ocurred to me,
which is also in effect a form of documentation.

The idea is to define two macros, BEGIN_NO_ALLOCATION and
END_NO_ALLOCATION, which can be used to bracket a region of code in
which memory allocation is not supposed to be done. If a memory
allocation does occur (directly, or in a function called from this
region), an error would be signaled.

Typically, these would enclose the whole body of a function. Eg,

SEXP a_function (SEXP arg)
{
BEGIN_NO_ALLOCATION;
... body of function ...
END_NO_ALLOCATION;
}

This also clearly documents that the function is guaranteed to not
allocate memory, and hence it is safe to called without protecting
some objects.

We also want to document that some functions do allocate memory. More
precisely, we want to document that some functions are not defined as
not allocating memory - they may or may not actually allocate memory
in the present implementation, but even if not, callers should assume
that they may in some future implementation. If such as function is
called when NO_ALLOCATION is in effect, an error should be signalled,
even if no allocation would actually have occurred (and hence no error
would be signaled as a result of an allocation).

To implement this, a ALLOCATION_POSSIBLE macro can be defined, which
signals an error if NO_ALLOCATION is in effect. It would typically be
the first statement of a function, thereby documenting that the
function is not guaranteed to not allocate.

This macro also helps with functions that do sometimes allocate, but
often not. A prevalent example is "install", which will allocate if
the symbol doesn't exist yet, but not if it does. Putting
ALLOCATION_POSSIBLE at the front of install will ensure that if it is
called from a NO_ALLOCATION region, an error will result, even if in
this particular run the symbol happens to have already been installed
before.

These macros are very easy to implement. Here is a simple way

int R_no_allocation_depth; /* global variable, initialized to 0 */

#define BEGIN_NO_ALLOCATION (R_no_allocation += 1)
#define END_NO_ALLOCATION (R_no_allocation -= 1)
#define ALLOCATION_POSSIBLE \
(R_no_allocation > 0 ? error("Violation of no allocation region") : 0)

These might be made more elaborate in order to produce compile-time
errors if the BEGIN_NO_ALLOCATION / END_NO_ALLOCATION macros are not
properly nested.

Of course, allocVector and other memory allocators would also signal
an error if R_no_allocation is greater than 0.

Any comments?

Radford

Join Rconsortium-wg-api@lists.r-consortium.org to automatically receive all group messages.