Re: Doxygen vs. texinfo (was: Documenting the C API)

Radford Neal

I think it's quite useful to have split the discussion up this way.

Regarding texi vs. doxygen, I think the first thing to keep in mind is
that texi is what is being used now. It's generally a bad idea to
change stuff in an existing project just because you have a personal
preference for doing it some other way - think of the disorganization
and wasted effort if everyone reformats the function they're working
on at the moment to use their preferred indentation style. Beyond the
problems it causes, the urge to change things for no particular reason
is also a big distraction from getting actually useful work done.

In this instance, the real gains from using doxygen seem trivial to me
(see below for a simple alternative for getting what gains there are),
and there is plenty of scope for (possibly unanticipated) downside.
Just not much liking texi (and I don't like it myself) is not a
sufficient reason to start using doxygen - in addition to texi, since
I presume no one is proposing rewriting all the manuals.

For instance, how would doxygen documentation integrate with the
existing texi documentation? This gets into what we think the
documentation will look like at the top level. Currently, the API
documentation, such as it is, is part of the "Writing R Extensions"
manual, in which section 6 is called "The R API: entry points for C
code". Would this section 6 be replaced by a reference to a new
separate document that discusses the R API in a narrow sense (omitting
stuff like how packages are organized)? But if you look at the whole
document, you'll see that lots of stuff essential for understanding
the API is actually in other sections. So is the idea to somehow
shoehorn the output of doxygen into the middle of the texi document?
Or is the proposal to redo this whole manual (roughly 12000 lines of
texi) in doxygen, and moreover, make the entirety of the new version
be contained in the C header file for the API? (Even the parts about
interfacing to Fortran?)

Frankly, this last possibility strikes me as ridiculous. The header
file would be approximately 99% doxygen comments, and 1% actual C
code, consisting exclusively of function prototypes. I think there
may be some sort of differing visions here of what API documentation
should consist of. I think that documentation on individual
functions, while essential, is not at all sufficient. There needs to
be much broader discussion of how to write C code to interface to R,
which does not break down into sections for individual functions.

Lukas writes:

It's true that, if the documentation is in the header files, it is
somewhat separate from the actual implementation. But still - it's
impossible to document a function that doesn't exist any more in the
API, and a new function in the API with no documentation will
immediately be noticed. (whoever adds it to the header file will
see that all other functions are documented) Other changes, like
parameters and their types, also happen in the header file, so it's
more likely that the change will make it into the documentation.
Even this limited amount of synchronization between API and
documentation would take a lot of discipline with a texi file.
I think there's a much easier way to ensure synchronization. Change
the texi documentation to systematically document all functions,
including, of course, their prototypes. At the end of each line
containing a C function prototype, put a comment like @c C-proto.
Then put the following in the Makefile:

grep "@c C-proto" R-exts.texi | sed "s/@c C-proto//" > R-api-protos.h

Now you've got a C header file containing exactly the prototypes that
are in the documentation. This isn't the header file included in user
programs. That one would define implementation-specific things (like
what SEXP is), and then include R-api-protos.h. Implementatons that
really need to can fiddle the output of the command above to, for
instance, make one of the functions be a macro. Or of course an
implementation could just not do this, taking on the task of keeping
things in sync manually, while other implementations can proceed as

This gets all the actual benefit of using doxygen in a way that's much
simpler, that doesn't change the current way of doing things
unnecessarily, and that doesn't constrain implementatons.

I think some people may be thinking that doxygen will bring other
benefits than just keeping the function prototypes in sync between the
documentation and the header file. But I don't think it does. All
the other things in C code that doxygen might automatically put in
documentation are things that we want to *hide*, not document.


Join to automatically receive all group messages.