Re: Less error-prone PROTECT in the R API

Jan Vitek <prof.vitek.j@...>

Tomas Kalibera send the following comments.

- I think that PROTECT errors are not a pressing problem, they seem to affect users quite rarely (?or not)

- my checking tool would solve most of the issues easily, but there is very little interest in using it. This may be because of the previous point that it does not affect people too much (or people believe it does not). It was discussed at DSC that it can be used for regular CRAN checks, but that did not happen yet as far as I know. Technically this would be quite easy to achieve, it is just a question of having access to the hardware, of some maintenance, and of forcing the package maintainers to look at errors and fix. The last bit may be hardest (a number of reports from the tool are false alarms), but perhaps not nearly as hard as asking them to rewrite code for new API.

- If there is a switch to C++, all of a sudden we can have a much easier to use API for protection; maybe the time needed for possibly wide adoption of a new C-based API is longer than the time before such switch; note that the R VM has been checked by my tool (perhaps I could do that again time to time); for the packages, it is unlikely a transition to the new API could happen fast

- the ideas are indeed reasonable, perhaps particularly having a "no_allocation" annotation for a function; it would have to imply that all recursively called functions also have this "no_allocation" property; it still does not solve all of the problems, the trickiest decisions on protection are around functions that sometimes allocate and sometimes do not and we depend on that we know when (e.g. getAttrib); Tooling can help (add the annotations, check them, easily using what I wrote). The annotations would make some checks intra-procedural, which might be easier for wider adoption, maybe

On Oct 10, 2016, at 5:28 PM, Radford Neal <radford@...> wrote:

Lukas writes:

I think that the interface should be even simpler, e.g., not have
the INNER versions of the macros.
You mean not allow an inner scope? From below, maybe you instead mean
allowing the BEGIN_PROTECT/END_PROTECT macros to be nested. The
problem with that is that I don't know how to implement it. A
RETURN... has to undo all the protection in the whole function, which
is doable when that means going back to the situation before the sole
BEGIN_PROTECT. But I haven't thought of a way to do this if they can
be nested.

Note that I've managed to set things up so that attempting to next
BEGIN_PROTECT will produce a compile error. (I arrange for various
other mis-uses to also produce compile errors.)

The fact that the INNER macros can be used instead of a statement
makes it harder to understand the structure of the code - it's easy
to miss errors if the indentation is incorrect.
The non-inner BEGIN_PROTECT / END_PROTECT can also be used as a
statement, though it wouldn't be typical usage. I suppose this is
contrary to the usual C convention of grouping with {}, but it seems
sort of unavoidable. I can't think of a way of producing a compile
error if it is used as a statement, so it's best to make it work
correctly in that context, and document it.

Also, with some macro magic, it should be possible to maintain a
nesting level without the user explicitly specifying it. Having
only one pair of BEGIN/END macros that can be nested makes for a
simpler API.
Well, if you can figure out how to do that, I'd be interested!

Is there a specific reason why you'd need different RETURN_FROM for
SEXP and non-SEXP? That macro is basically just { END_PROTECT;
return value; }, which should work the same for both types.
After END_PROTECT, the variables declared by the BEGIN_PROTECT will no
longer exist, so that would often not work. Also, the expression in
the return may involve calls to functions that allocate storage, so
protection may need to be maintained while the return value expression
is evaluated. This requires storing the return value in a variable,
which for RETURN_SEXP_INSIDE_PROTECT can be declared to be of type SEXP.

For other return types, there is no way I know of of declaring a
variable in the macro that will be of the same type as the value to be
returned. So it's necessary to just do the return with the given
expression, which means the protection has to be undone before the
return. Fortunately, expressions containing function calls that
allocate memory seem less likely if the value is not a SEXP.

I would even cut down the API to only contain BEGIN_PROTECT,
multi-variable versions - it does lead to more lines, but I would
consider that to be even cleaner. (but that may be because that
fits my programming style best, there???s no harm in having the
multi-var versions)
I'm not sure what your proposing. Would BEGIN_PROTECT be equivalent
to BEGIN_PROTECT1, declaring just one variable? But if you really
wanted BEGIN_PROTECT0, you'd have to declare a dummy variable that you
don't actually need. And if you need seven variable, you'd have seven
BEGIN_PROTECTs (assuming one avoided the need for an INNER version),
and later seven END_PROTECTs, with the numbers of course having to
match. This seems rather awkward and error-prone.

Or would your BEGIN_PROTECT declare no variables, with all protection
done with ALSO_PROTECT? That would require that the variables be
declared separately, and crucially, that they be initialized to a
valid SEXP value. That seems very error prone, which is why I wanted
to have the variables be declared and initialized in the same macro
that protects them. That way, people would have to work hard to
manage to protect a variable that doesn't have a valid SEXP value.

a very good idea. Not sure if you can force people to use it, but
if it's used consistently in the R implementation, users of the
native API have an easy way to diagnose PROTECT problems.
It's useful in that it adds documentation and has a very clear
definition. Functions that unexpectedly allocate under rare
circumstances are an frequent source of bugs. I guess Tomas
Kalibera, having worked on the protect analysis tool, can tell
stories about this :-)
The problem I see is that, the same as with many other areas in the
API, people have started to depend on the actual behavior instead of
some defined behavior. I.e., it may not be possible to put an
ALLOCATION_POSSIBLE at the front of install, because this function
is expected not to allocate if the symbol already exists.
That would be an issue if we were introducing ALLOCATION_POSSIBLE
after BEGIN_NO_ALLOCATION / END_NO_ALLOCATION had been around for a
while, but if they're introduced simultaneously, I think it probably
wouldn't be. (Though it could be if ALLOCATION_PROSSIBLE was
introduced, but not actually used in much of the R interpreter, with
more widespread usage coming later.)

Some additional points: Implementing BEGIN_NO_ALLCOATION /
END_NO_ALLOCATION will also require storing the current counter in the
"context" structure, so it can be restored on error exits. This
wouldn't be hard, and would add only a slight overhead.

All overhead could be eliminated with a configuration option to make
all these macros do nothing. I'm not sure whether or not this would
be the recommended configuration for producution use.

Finally, when gc_torture is enabled, the ALLOCATION_POSSIBLE macro
could initiate a garbage collection, which might ferret out additional
bugs. Whether this is done could also, of course, be a configuration

Rconsortium-wg-api mailing list

Join to automatically receive all group messages.