
Hello PPL maintainers & Xavier Leroy & Damien Doligez (first.last@inria.fr) who are in BCC to Xavier & Damien, PPL is a marvelous GPL-ed library http://www.cs.unipr.it/ppl/
Enea Zaffanella wrote:
Basile Starynkevitch wrote: [...]
In particular, detailed explanation of Ocaml binding to PPL is welcome, since I happen to know quite well Ocaml (and the C coding rules required by its runtime).
Hello Basile.
I am trying to improve our OCaml language interface and it really looks like you are the right person I can ask a few simple questions.
I'll try. The best persons to ask are of course Xavier Leroy & Damien Doligez (Xavier being the father/guru of Ocaml, and Damien being the father/guru of its runtime), both from INRIA. I am BCC-ing them just in case!
First of all, I believe we are currently doing something bad even when doing simple things such as interfacing PPL variable indices.
In the C++ world, a variable index has type `dimension_type', which is just a typedef for the standard unsigned integer type `size_t'.
I never understood why you don't use int or int32_t for dimension_type. IMHO, PPL dimension_type-s are reasonable integers, I mean that several PPL data structures have a memory size proportional to the value of some dimension_type; so if in my program I have some dimension_type whose value is one billion = 1e9, I would expect the memory requirements to be multiple of billions (so dozens of gigabytes), and CPU time usage to be probably a billion squared of some small CPU step time, which would amount for a large time since 1e18 nanoseconds is more than my remaining lifetime (I think that 1e9 sec = 31 years), and certainly more than the lifetime of any project able to use PPL. Therefore, I don't think PPL would realistically handle dimension_type bigger than a billion. Hence my int32_t suggestion for dimension_type.
On AMD64/Linux size_t is long ie 64 bits while int is 32 bits (like int32_t).
And changing dimension_type is only changing one typedef in ppl.h! Of course, the ABI would be incompatible: people would have to recompile PPL!
As far as I have understood, OCaml uses tagged signed integers (`int').
Yes. Practically, their LSBit is 1 and they are intptr_t (ie machine words, so 64 bits on AMD64/Linux, hence 63 bits integers since one bit is lost).
What is the right way to translate one into the other?
We are currently using Int_val / Val_int but we just discovered that this casts the value to the C native "int" datatype. Hence, we should be probably using Long_val / Val_long. However, there also is Unsigned_long_val ... but it is unclear to me if/when I can use it and feel confident that I am not misreading the input of the OCaml user.
You might use Long_val / Val_long and explicitly cast dimension_type to (long) in the Ocaml glue code in C or C++.
So, what is the right way to code helper functions such as
dimension_type caml_value_to_ppl_dimension(value);
I have a recent git of PPL, and cannot find these functions.
You could code a quick variant ie inline dimension_type caml_value_to_ppl_dimension(value v) { return (dimension_type)Long_val(v); }
However, this is an optimisation which happens to usually work because you could know that the Ocaml GC [or malloc or new] is never called in that function. To be very safe, you'll better code the safe variant
dimension_type caml_value_to_ppl_dimension(value v) { dimension_type r=0; CAMLparam1(v); r = (dimension_type)Long_val(v); CAMLreturnT(dimension_type, r); }
Such a function would probably be safe even in the improbable event that Xavier Leroy would want Ocaml integers to be more polymorphic... (Imagine that Xavier would want Ocaml 5.43 integers to be tagged int, or boxed int64_t, or boxed GMP mpz_t! But knowing Xavier, I won't believe that could happen.)
value ppl_dimension_to_caml_value(dimension_type);
I leave this one as an exercise. (both quick & safe variants).
taking into account that: a) we would like to react properly (throwing an exception) when an OCaml user wrongly passes in a negative value (or, if possible, a value that is too big to fit into a dimension_type);
Where do you want to react? If it is inside caml_value_to_ppl_dimension you have to use my safer variant! I was supposing you will react in the caller of value_to_ppl_dimension.
b) we would like to place assertions warning us whenever we try to convert a dimension_type that does not fit into an OCaml integer.
I'm not sure to understand that one. Ocaml integers have always one bit less than the machine integers. How and where do you handle the case when the integer is (on 32 bits machine) bigger than 2^30, ie fits positively in a C int but not in an Ocaml int?
c) we want to strive for maximum portability (i.e., support both different computing platforms and, if possible, different OCaml releases).
Second question: recently we corrected several GC-related issues reported by Kenneth MacKenzie. There were several places where our code was not GC safe. However, there are other places where, IMHO, we are too conservative. For instance:
extern "C" CAMLprim value ppl_version_major(value unit) try { CAMLparam1(unit); CAMLreturn(Val_long(version_major())); } CATCH_ALL
Is there any need to wrap unit and the return value (an unboxed CAML int, afaict) with the calls to CAMLparam1 and CAMLreturn ?
There is no need to wrap it in that precise case (because we know that verion_major() is a constant), but I strongly recommend to keep the wrapping and stay conservative. If you don't keep the wrapping, leave a big fat comment in your code (really a big fat comment) to warn the future PPL maintainer! And I don't think any sensible Ocaml application using PPL would call that ppl_version_major function a million times (but only once at most).
Similarly, in the following example,
extern "C" CAMLprim value ppl_set_rounding_for_PPL(value unit) try { CAMLparam1(unit); set_rounding_for_PPL(); CAMLreturn(Val_unit); } CATCH_ALL
is there any need to wrap Val_unit ?
If set_rounding_for_PPL call some complex code which might apply an Ocaml closure or raise an Ocaml exception or allocate an Ocaml boxed value, you need to keep it.
My advice is to always keep the CAML* macros every where. In the rare functions which you believe are called billions of time and which you are sure never call the Ocaml GC or apply an Ocaml closure or raise an Ocaml exception even indirectly, you might remove them. IIRC, there is a nightmare scenario like your code don't call the Ocaml GC but does call new or malloc, latter the malloc-ed zone is released, still latter it is used by Ocaml GC (or is it the reverse?) and days latter Ocaml GC deallocated etc.. I forgot the details, ask Damien, but they are really ugly!
Thanks in advance, Enea.