Re: Less error-prone PROTECT in the R API

Radford Neal
 

Lukas writes:

I think that the interface should be even simpler, e.g., not have
the INNER versions of the macros.
You mean not allow an inner scope? From below, maybe you instead mean
allowing the BEGIN_PROTECT/END_PROTECT macros to be nested. The
problem with that is that I don't know how to implement it. A
RETURN... has to undo all the protection in the whole function, which
is doable when that means going back to the situation before the sole
BEGIN_PROTECT. But I haven't thought of a way to do this if they can
be nested.

Note that I've managed to set things up so that attempting to next
BEGIN_PROTECT will produce a compile error. (I arrange for various
other mis-uses to also produce compile errors.)

The fact that the INNER macros can be used instead of a statement
makes it harder to understand the structure of the code - it's easy
to miss errors if the indentation is incorrect.
The non-inner BEGIN_PROTECT / END_PROTECT can also be used as a
statement, though it wouldn't be typical usage. I suppose this is
contrary to the usual C convention of grouping with {}, but it seems
sort of unavoidable. I can't think of a way of producing a compile
error if it is used as a statement, so it's best to make it work
correctly in that context, and document it.

Also, with some macro magic, it should be possible to maintain a
nesting level without the user explicitly specifying it. Having
only one pair of BEGIN/END macros that can be nested makes for a
simpler API.
Well, if you can figure out how to do that, I'd be interested!

Is there a specific reason why you'd need different RETURN_FROM for
SEXP and non-SEXP? That macro is basically just { END_PROTECT;
return value; }, which should work the same for both types.
After END_PROTECT, the variables declared by the BEGIN_PROTECT will no
longer exist, so that would often not work. Also, the expression in
the return may involve calls to functions that allocate storage, so
protection may need to be maintained while the return value expression
is evaluated. This requires storing the return value in a variable,
which for RETURN_SEXP_INSIDE_PROTECT can be declared to be of type SEXP.

For other return types, there is no way I know of of declaring a
variable in the macro that will be of the same type as the value to be
returned. So it's necessary to just do the return with the given
expression, which means the protection has to be undone before the
return. Fortunately, expressions containing function calls that
allocate memory seem less likely if the value is not a SEXP.

I would even cut down the API to only contain BEGIN_PROTECT,
ALSO_PROTECT, END_PROTECT and RETURN_INSIDE_PROTECT. Without the
multi-variable versions - it does lead to more lines, but I would
consider that to be even cleaner. (but that may be because that
fits my programming style best, there???s no harm in having the
multi-var versions)
I'm not sure what your proposing. Would BEGIN_PROTECT be equivalent
to BEGIN_PROTECT1, declaring just one variable? But if you really
wanted BEGIN_PROTECT0, you'd have to declare a dummy variable that you
don't actually need. And if you need seven variable, you'd have seven
BEGIN_PROTECTs (assuming one avoided the need for an INNER version),
and later seven END_PROTECTs, with the numbers of course having to
match. This seems rather awkward and error-prone.

Or would your BEGIN_PROTECT declare no variables, with all protection
done with ALSO_PROTECT? That would require that the variables be
declared separately, and crucially, that they be initialized to a
valid SEXP value. That seems very error prone, which is why I wanted
to have the variables be declared and initialized in the same macro
that protects them. That way, people would have to work hard to
manage to protect a variable that doesn't have a valid SEXP value.

Concerning BEGIN_NO_ALLOCATION / END_NO_ALLOCATION: That seems like
a very good idea. Not sure if you can force people to use it, but
if it's used consistently in the R implementation, users of the
native API have an easy way to diagnose PROTECT problems.
It's useful in that it adds documentation and has a very clear
definition. Functions that unexpectedly allocate under rare
circumstances are an frequent source of bugs. I guess Tomas
Kalibera, having worked on the protect analysis tool, can tell
stories about this :-)
The problem I see is that, the same as with many other areas in the
API, people have started to depend on the actual behavior instead of
some defined behavior. I.e., it may not be possible to put an
ALLOCATION_POSSIBLE at the front of install, because this function
is expected not to allocate if the symbol already exists.
That would be an issue if we were introducing ALLOCATION_POSSIBLE
after BEGIN_NO_ALLOCATION / END_NO_ALLOCATION had been around for a
while, but if they're introduced simultaneously, I think it probably
wouldn't be. (Though it could be if ALLOCATION_PROSSIBLE was
introduced, but not actually used in much of the R interpreter, with
more widespread usage coming later.)

Some additional points: Implementing BEGIN_NO_ALLCOATION /
END_NO_ALLOCATION will also require storing the current counter in the
"context" structure, so it can be restored on error exits. This
wouldn't be hard, and would add only a slight overhead.

All overhead could be eliminated with a configuration option to make
all these macros do nothing. I'm not sure whether or not this would
be the recommended configuration for producution use.

Finally, when gc_torture is enabled, the ALLOCATION_POSSIBLE macro
could initiate a garbage collection, which might ferret out additional
bugs. Whether this is done could also, of course, be a configuration
option.

Radford

Join Rconsortium-wg-api@lists.r-consortium.org to automatically receive all group messages.