Re: Less error-prone PROTECT in the R API

Lukas Stadler
 

Hi!

I have to admit that the BEGIN_PROTECT / END_PROTECT scheme looks very C++’ish to me (which is not meant to be a bad thing).
It reminds me of smart pointers, where it’s also not the value, but the container that keeps the value alive.
It’s a shame that you need the RETURN_SEXP_INSIDE_PROTECT macro in C...

I think that the interface should be even simpler, e.g., not have the INNER versions of the macros.
The fact that the INNER macros can be used instead of a statement makes it harder to understand the structure of the code - it’s easy to miss errors if the indentation is incorrect.
Also, with some macro magic, it should be possible to maintain a nesting level without the user explicitly specifying it.
Having only one pair of BEGIN/END macros that can be nested makes for a simpler API.

Is there a specific reason why you’d need different RETURN_FROM for SEXP and non-SEXP?
That macro is basically just “{ END_PROTECT; return value; }”, which should work the same for both types.

I would even cut down the API to only contain BEGIN_PROTECT, ALSO_PROTECT, END_PROTECT and RETURN_INSIDE_PROTECT.
Without the multi-variable versions - it does lead to more lines, but I would consider that to be even cleaner.
(but that may be because that fits my programming style best, there’s no harm in having the multi-var versions)

Concerning BEGIN_NO_ALLOCATION / END_NO_ALLOCATION:
That seems like a very good idea.
Not sure if you can force people to use it, but if it’s used consistently in the R implementation, users of the native API have an easy way to diagnose PROTECT problems.
It’s useful in that it adds documentation and has a very clear definition.
Functions that unexpectedly allocate under rare circumstances are an frequent source of bugs.
I guess Tomas Kalibera, having worked on the protect analysis tool, can tell stories about this :-)

The problem I see is that, the same as with many other areas in the API, people have started to depend on the actual behavior instead of some defined behavior.
I.e., it may not be possible to put an “ALLOCATION_POSSIBLE” at the front of install, because this function is expected not to allocate if the symbol already exists.

- Lukas

On 10 Oct 2016, at 5:14, Radford Neal <radford@...> wrote:

Dear R API people,

The new PROTECT scheme that I proposed in a previous email is now
implemented in the version of pqR released a few days ago. So if you
like, you can try it out. You can get pqR from pqR-project.org. The
documentation on the new facility is found in this section of pqR's
R-exts.html: http://www.pqr-project.org/R-exts.html#Garbage-Collection

I think the new scheme is substantially less error prone in
complicated situations. In situations where just a few things need
protecting in an obvious manner, using PROTECT and UNPROTECT will be
be similarly simple, and slightly faster, but the new scheme would
also work fine, and could be recommended in all situations. PROTECT
and UNPROTECT are of course still present, for backwards compatibility
if nothing else.


Another way of reducing PROTECT/UNPROTECT bugs has ocurred to me,
which is also in effect a form of documentation.

The idea is to define two macros, BEGIN_NO_ALLOCATION and
END_NO_ALLOCATION, which can be used to bracket a region of code in
which memory allocation is not supposed to be done. If a memory
allocation does occur (directly, or in a function called from this
region), an error would be signaled.

Typically, these would enclose the whole body of a function. Eg,

SEXP a_function (SEXP arg)
{
BEGIN_NO_ALLOCATION;
... body of function ...
END_NO_ALLOCATION;
}

This also clearly documents that the function is guaranteed to not
allocate memory, and hence it is safe to called without protecting
some objects.

We also want to document that some functions do allocate memory. More
precisely, we want to document that some functions are not defined as
not allocating memory - they may or may not actually allocate memory
in the present implementation, but even if not, callers should assume
that they may in some future implementation. If such as function is
called when NO_ALLOCATION is in effect, an error should be signalled,
even if no allocation would actually have occurred (and hence no error
would be signaled as a result of an allocation).

To implement this, a ALLOCATION_POSSIBLE macro can be defined, which
signals an error if NO_ALLOCATION is in effect. It would typically be
the first statement of a function, thereby documenting that the
function is not guaranteed to not allocate.

This macro also helps with functions that do sometimes allocate, but
often not. A prevalent example is "install", which will allocate if
the symbol doesn't exist yet, but not if it does. Putting
ALLOCATION_POSSIBLE at the front of install will ensure that if it is
called from a NO_ALLOCATION region, an error will result, even if in
this particular run the symbol happens to have already been installed
before.

These macros are very easy to implement. Here is a simple way

int R_no_allocation_depth; /* global variable, initialized to 0 */

#define BEGIN_NO_ALLOCATION (R_no_allocation += 1)
#define END_NO_ALLOCATION (R_no_allocation -= 1)
#define ALLOCATION_POSSIBLE \
(R_no_allocation > 0 ? error("Violation of no allocation region") : 0)

These might be made more elaborate in order to produce compile-time
errors if the BEGIN_NO_ALLOCATION / END_NO_ALLOCATION macros are not
properly nested.

Of course, allocVector and other memory allocators would also signal
an error if R_no_allocation is greater than 0.

Any comments?

Radford
_______________________________________________
Rconsortium-wg-api mailing list
Rconsortium-wg-api@...
https://lists.r-consortium.org/mailman/listinfo/rconsortium-wg-api

Join Rconsortium-wg-api@lists.r-consortium.org to automatically receive all group messages.