Re: [PPL-devel] pointer "ownership" in the C API?

27 Mar 2009


      Hello PPL maintainers & Xavier Leroy & Damien Doligez 
(first.last@inria.fr) who are in BCC
to Xavier & Damien, PPL is a marvelous GPL-ed library 
http://www.cs.unipr.it/ppl/
Enea Zaffanella wrote:
...
Basile Starynkevitch wrote:
[...]
...
In particular, detailed explanation of Ocaml binding to PPL is welcome,
since I happen to know quite well Ocaml (and the C coding rules 
required by
its runtime).
Hello Basile.
I am trying to improve our OCaml language interface and it really 
looks like you are the right person I can ask a few simple questions.
I'll try. The best persons to ask are of course Xavier Leroy & Damien 
Doligez (Xavier being the father/guru of Ocaml, and Damien being the 
father/guru of its runtime), both from INRIA. I am BCC-ing them just in 
case!
...
First of all, I believe we are currently doing something bad even when 
doing simple things such as interfacing PPL variable indices.
In the C++ world, a variable index has type `dimension_type', which is 
just a typedef for the standard unsigned integer type `size_t'.
I never understood why you don't use int or int32_t for dimension_type. 
IMHO, PPL dimension_type-s are reasonable integers, I mean that several 
PPL data structures have a memory size proportional to the value of some 
dimension_type; so if in my program I have some dimension_type whose 
value is one billion = 1e9, I would expect the memory requirements to be 
multiple of billions (so dozens of gigabytes), and CPU time usage to be 
probably a billion squared of some small CPU step time, which would 
amount for a large time since 1e18 nanoseconds is more than my remaining 
lifetime (I think that 1e9 sec = 31 years), and certainly more than the 
lifetime of any project able to use PPL. Therefore, I don't think PPL 
would realistically handle dimension_type bigger than a billion. Hence 
my int32_t suggestion for dimension_type.
On AMD64/Linux size_t is long ie 64 bits while int is 32 bits (like 
int32_t).
And changing dimension_type is only changing one typedef in ppl.h! Of 
course, the ABI would be incompatible: people would have to recompile PPL!
...
As far as I have understood, OCaml uses tagged signed integers (`int').
Yes. Practically, their LSBit is 1 and they are intptr_t (ie machine 
words, so 64 bits  on AMD64/Linux, hence 63 bits integers since one bit 
is lost).
...
What is the right way to translate one into the other?
We are currently using
  Int_val / Val_int
but we just discovered that this casts the value to the C native "int" 
datatype. Hence, we should be probably using Long_val / Val_long.
However, there also is Unsigned_long_val ... but it is unclear to me 
if/when I can use it and feel confident that I am not misreading the 
input of the OCaml user.
You might use Long_val / Val_long and explicitly cast dimension_type to 
(long) in the Ocaml glue code in C or C++.
...
So, what is the right way to code helper functions such as
dimension_type caml_value_to_ppl_dimension(value);
I have a recent git of PPL, and cannot find these functions.
You could code a quick variant ie
    inline dimension_type caml_value_to_ppl_dimension(value v)
    {
        return  (dimension_type)Long_val(v);
    }
However, this is an optimisation which happens to usually work because 
you could know that the Ocaml GC [or malloc or new] is never called in 
that function. To be very safe, you'll better code the safe variant
dimension_type caml_value_to_ppl_dimension(value v)
    {
        dimension_type r=0;
        CAMLparam1(v);
        r = (dimension_type)Long_val(v);
        CAMLreturnT(dimension_type, r);
     }
Such a function would probably be safe even in the improbable event that 
Xavier Leroy would want Ocaml integers to be more polymorphic... 
(Imagine that Xavier would want Ocaml 5.43 integers to be tagged int, or 
boxed int64_t, or boxed GMP mpz_t! But knowing Xavier, I won't believe 
that could happen.)
...
value ppl_dimension_to_caml_value(dimension_type);
I leave this one as an exercise. (both quick & safe variants).
...
taking into account that:
a) we would like to react properly (throwing an exception) when an 
OCaml user wrongly passes in a negative value (or, if possible, a 
value that is too big to fit into a dimension_type);
Where do you want to react? If it is inside caml_value_to_ppl_dimension 
you have to use my safer variant! I was supposing you will react in the 
caller of value_to_ppl_dimension.
...
b) we would like to place assertions warning us whenever we try to 
convert a dimension_type that does not fit into an OCaml integer.
I'm not sure to understand that one. Ocaml integers have always one bit 
less than the machine integers. How and where do you handle the case 
when the integer is (on 32 bits machine) bigger than 2^30, ie fits 
positively in a C int but not in an Ocaml int?
...
c) we want to strive for maximum portability (i.e., support both 
different computing platforms and, if possible, different OCaml 
releases).
Second question:
recently we corrected several GC-related issues reported by Kenneth 
MacKenzie. There were several places where our code was not GC safe.
However, there are other places where, IMHO, we are too conservative. 
For instance:
extern "C"
CAMLprim value
ppl_version_major(value unit) try {
  CAMLparam1(unit);
  CAMLreturn(Val_long(version_major()));
}
CATCH_ALL
Is there any need to wrap unit and the return value (an unboxed CAML 
int, afaict) with the calls to CAMLparam1 and CAMLreturn ?
There is no need to wrap it in that precise case (because we know that 
verion_major() is a constant), but I strongly recommend to keep the 
wrapping and stay conservative. If you don't keep the wrapping, leave a 
big fat comment in your code (really a big fat comment) to warn the 
future PPL maintainer! And I don't think any sensible Ocaml application 
using PPL would call that ppl_version_major function a million times 
(but only once at most).
...
Similarly, in the following example,
extern "C"
CAMLprim value
ppl_set_rounding_for_PPL(value unit) try {
  CAMLparam1(unit);
  set_rounding_for_PPL();
  CAMLreturn(Val_unit);
}
CATCH_ALL
is there any need to wrap Val_unit ?
If set_rounding_for_PPL call some complex code which might apply an 
Ocaml closure or raise an Ocaml exception or allocate an Ocaml boxed 
value, you need to keep it.
My advice is to always keep the CAML* macros every where. In the rare 
functions which you believe are called billions of time and which you 
are sure never call the Ocaml GC or apply an Ocaml closure or raise an 
Ocaml exception even indirectly, you might remove them. IIRC, there is a 
nightmare scenario like your code don't call the Ocaml GC but does call 
new or malloc, latter the malloc-ed zone is released, still latter it is 
used by Ocaml GC (or is it the reverse?) and days latter Ocaml GC 
deallocated etc.. I forgot the details, ask Damien, but they are really 
ugly!
...
Thanks in advance,
Enea.
-- 
Basile STARYNKEVITCH         http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mines, sont seulement les miennes} ***