pointer "ownership" in the C API?

Hello All,
A newbie question regarding the C API. What is the pointer ownership convention, and where could I find tiny examples or tests of the C API of PPL?
So, I made a ppl_Coefficient_t (let's call it k) using ppl_new_Coefficient_from_mpz_t. [BTW, perhaps a ppl_new_Coefficient_from_long_long or a ppl_new_Coefficient_from_int32 might be useful]
Then I am filling a ppl_Linear_Expression_t (let's call it l) using ppl_Linear_Expression_add_to_coefficient with the same coeffiient k.
Should I explicitly delete by calling ppl_delete_Coefficient(k), or is k now "owned" by l and not available anymore?
Could I use twice the same k in different calls to ppl_Linear_Expression_add_to_coefficient?
I would suppose that PPL use a refcounting scheme as its memory management (or garbage collection) scheme. Is it correct? Who/when are the refcounters incremented & decremented? Is there a C API to e.g. force incrementation of the refcounter (like GTK has)?
BTW, a more general question is about the philosophy of memory management in PPL, especially when interfacing PPL to some language. Since my GCC MELT can be percieved as some language (the MELT lisp dialect) with its runtime (the MELT garbage collector, above the GGC inside GCC), I could view the interfacing of PPL inside MELT as a language binding of PPL.
In particular, detailed explanation of Ocaml binding to PPL is welcome, since I happen to know quite well Ocaml (and the C coding rules required by its runtime).
Again, apologies for asking such questions (whose answers are probably inside PPL source code).
Regards

Basile Starynkevitch wrote:
A newbie question regarding the C API. What is the pointer ownership convention, and where could I find tiny examples or tests of the C API of PPL?
The C language interface of the PPL is meant to replicate, as closely as possible, what is available in the C++ language interface.
Usually, the C++ interface provides for each of its datatype:
- (copy) constructors ==> (in C) ppl_new_* - copy assignment ==> (in C) ppl_assign_* - destructor ==> (in C) ppl_delete_*
with the semantics usually adopted when defining so-call "concrete types" having value (not reference) semantics. Unless otherwise stated, copy construction and assignment are implemented as _deep_ copy operations and provide no reference counting. (NOTE: a form of reference counting is implemented for the elements of powerset objects.)
Regarding ownership:
1) After a call to a ppl_new_<TYPE>* method, the caller has to take responsibility of the newly created object (usually, the first argument of the call, explicitly passed as a pointer, i.e., ppl_<TYPE>_t*). That is, the rule of thumb is that you get ownership if there is the "new" in the function name. As an example of a resource that you do *not* own is the resource pointed by pcs after a call to function:
int ppl_Polyhedron_get_minimized_constraints (ppl_const_Polyhedron_t ph, ppl_const_Constraint_System_t* pcs);
2) The one and only way to release a resource r *that you are owning* is by calling ppl_delete_<TYPE>(r).
3) Except for case 2) above, ownership is preserved for all input arguments passed as ppl_<TYPE>_t or ppl_const<TYPE>_t. No matter if these arguments are modified or not during the call, the responsibility for their resources does not change (i.e., typically the caller will be responsible for calling ppl_delete at some time).
Some of the constructors for PPL objects avoid the need of deep copies. For instance, we have
int ppl_new_C_Polyhedron_recycle_Constraint_System (ppl_Polyhedron_t* pph, ppl_Constraint_System_t cs);
This will build a new polyhedron *reusing* the data structures provided in the input argument cs, thereby avoiding the expensive copy.
BEWARE: after calling the function above, the input argument cs is still a well behaved Constraint_System object owned by the caller: you simply cannot predict its contents, which might have changed. That is, the caller is still responsible of properly destroy cs if it is no longer needed, by calling
ppl_delete_Constraint_System(cs);
So, I made a ppl_Coefficient_t (let's call it k) using ppl_new_Coefficient_from_mpz_t. [BTW, perhaps a ppl_new_Coefficient_from_long_long or a ppl_new_Coefficient_from_int32 might be useful]
Then I am filling a ppl_Linear_Expression_t (let's call it l) using ppl_Linear_Expression_add_to_coefficient with the same coeffiient k.
Should I explicitly delete by calling ppl_delete_Coefficient(k), or is k now "owned" by l and not available anymore?
As said above, a _copy_ of k has gone into l. k is still owned by you: you can reuse it, assign to it, etc. and finally you have to call ppl_delete_Coefficient(k).
Could I use twice the same k in different calls to ppl_Linear_Expression_add_to_coefficient?
Yes.
I would suppose that PPL use a refcounting scheme as its memory management (or garbage collection) scheme. Is it correct? Who/when are the refcounters incremented & decremented? Is there a C API to e.g. force incrementation of the refcounter (like GTK has)?
No, usually there is no reference counting scheme. As said above, we have a reference counting in our Powerset datatypes, but this is transparent to the users of all the language interfaces.
More generally, you can regard the C interface as providing the basic services around which the client application can build its own abstractions. Reference counting schemes are one such possibility.
BTW, a more general question is about the philosophy of memory management in PPL, especially when interfacing PPL to some language. Since my GCC MELT can be percieved as some language (the MELT lisp dialect) with its runtime (the MELT garbage collector, above the GGC inside GCC), I could view the interfacing of PPL inside MELT as a language binding of PPL.
Not sure we understand the question. The basic design principles of the PPL interfaces are:
1) try to provide interfaces that are "natural" in the interfaced language; 2) do the basic things efficiently and do not interfere with further abstractions user may want to implement.
In particular, detailed explanation of Ocaml binding to PPL is welcome, since I happen to know quite well Ocaml (and the C coding rules required by its runtime).
We suggest you look into the library's manuals first (core, C interface and OCaml interface). If something is missing there, please let us know.
For the C interfaces, and additional source of examples is the ppl_lpsol program in demos/ppl_lpsol. Cheers,
Enea and Roberto

Basile Starynkevitch wrote: [...]
In particular, detailed explanation of Ocaml binding to PPL is welcome, since I happen to know quite well Ocaml (and the C coding rules required by its runtime).
Hello Basile.
I am trying to improve our OCaml language interface and it really looks like you are the right person I can ask a few simple questions.
First of all, I believe we are currently doing something bad even when doing simple things such as interfacing PPL variable indices.
In the C++ world, a variable index has type `dimension_type', which is just a typedef for the standard unsigned integer type `size_t'. As far as I have understood, OCaml uses tagged signed integers (`int').
What is the right way to translate one into the other?
We are currently using Int_val / Val_int but we just discovered that this casts the value to the C native "int" datatype. Hence, we should be probably using Long_val / Val_long. However, there also is Unsigned_long_val ... but it is unclear to me if/when I can use it and feel confident that I am not misreading the input of the OCaml user.
So, what is the right way to code helper functions such as
dimension_type caml_value_to_ppl_dimension(value); value ppl_dimension_to_caml_value(dimension_type);
taking into account that: a) we would like to react properly (throwing an exception) when an OCaml user wrongly passes in a negative value (or, if possible, a value that is too big to fit into a dimension_type); b) we would like to place assertions warning us whenever we try to convert a dimension_type that does not fit into an OCaml integer. c) we want to strive for maximum portability (i.e., support both different computing platforms and, if possible, different OCaml releases).
Second question: recently we corrected several GC-related issues reported by Kenneth MacKenzie. There were several places where our code was not GC safe. However, there are other places where, IMHO, we are too conservative. For instance:
extern "C" CAMLprim value ppl_version_major(value unit) try { CAMLparam1(unit); CAMLreturn(Val_long(version_major())); } CATCH_ALL
Is there any need to wrap unit and the return value (an unboxed CAML int, afaict) with the calls to CAMLparam1 and CAMLreturn ?
Similarly, in the following example,
extern "C" CAMLprim value ppl_set_rounding_for_PPL(value unit) try { CAMLparam1(unit); set_rounding_for_PPL(); CAMLreturn(Val_unit); } CATCH_ALL
is there any need to wrap Val_unit ?
Thanks in advance, Enea.

Hello PPL maintainers & Xavier Leroy & Damien Doligez (first.last@inria.fr) who are in BCC to Xavier & Damien, PPL is a marvelous GPL-ed library http://www.cs.unipr.it/ppl/
Enea Zaffanella wrote:
Basile Starynkevitch wrote: [...]
In particular, detailed explanation of Ocaml binding to PPL is welcome, since I happen to know quite well Ocaml (and the C coding rules required by its runtime).
Hello Basile.
I am trying to improve our OCaml language interface and it really looks like you are the right person I can ask a few simple questions.
I'll try. The best persons to ask are of course Xavier Leroy & Damien Doligez (Xavier being the father/guru of Ocaml, and Damien being the father/guru of its runtime), both from INRIA. I am BCC-ing them just in case!
First of all, I believe we are currently doing something bad even when doing simple things such as interfacing PPL variable indices.
In the C++ world, a variable index has type `dimension_type', which is just a typedef for the standard unsigned integer type `size_t'.
I never understood why you don't use int or int32_t for dimension_type. IMHO, PPL dimension_type-s are reasonable integers, I mean that several PPL data structures have a memory size proportional to the value of some dimension_type; so if in my program I have some dimension_type whose value is one billion = 1e9, I would expect the memory requirements to be multiple of billions (so dozens of gigabytes), and CPU time usage to be probably a billion squared of some small CPU step time, which would amount for a large time since 1e18 nanoseconds is more than my remaining lifetime (I think that 1e9 sec = 31 years), and certainly more than the lifetime of any project able to use PPL. Therefore, I don't think PPL would realistically handle dimension_type bigger than a billion. Hence my int32_t suggestion for dimension_type.
On AMD64/Linux size_t is long ie 64 bits while int is 32 bits (like int32_t).
And changing dimension_type is only changing one typedef in ppl.h! Of course, the ABI would be incompatible: people would have to recompile PPL!
As far as I have understood, OCaml uses tagged signed integers (`int').
Yes. Practically, their LSBit is 1 and they are intptr_t (ie machine words, so 64 bits on AMD64/Linux, hence 63 bits integers since one bit is lost).
What is the right way to translate one into the other?
We are currently using Int_val / Val_int but we just discovered that this casts the value to the C native "int" datatype. Hence, we should be probably using Long_val / Val_long. However, there also is Unsigned_long_val ... but it is unclear to me if/when I can use it and feel confident that I am not misreading the input of the OCaml user.
You might use Long_val / Val_long and explicitly cast dimension_type to (long) in the Ocaml glue code in C or C++.
So, what is the right way to code helper functions such as
dimension_type caml_value_to_ppl_dimension(value);
I have a recent git of PPL, and cannot find these functions.
You could code a quick variant ie inline dimension_type caml_value_to_ppl_dimension(value v) { return (dimension_type)Long_val(v); }
However, this is an optimisation which happens to usually work because you could know that the Ocaml GC [or malloc or new] is never called in that function. To be very safe, you'll better code the safe variant
dimension_type caml_value_to_ppl_dimension(value v) { dimension_type r=0; CAMLparam1(v); r = (dimension_type)Long_val(v); CAMLreturnT(dimension_type, r); }
Such a function would probably be safe even in the improbable event that Xavier Leroy would want Ocaml integers to be more polymorphic... (Imagine that Xavier would want Ocaml 5.43 integers to be tagged int, or boxed int64_t, or boxed GMP mpz_t! But knowing Xavier, I won't believe that could happen.)
value ppl_dimension_to_caml_value(dimension_type);
I leave this one as an exercise. (both quick & safe variants).
taking into account that: a) we would like to react properly (throwing an exception) when an OCaml user wrongly passes in a negative value (or, if possible, a value that is too big to fit into a dimension_type);
Where do you want to react? If it is inside caml_value_to_ppl_dimension you have to use my safer variant! I was supposing you will react in the caller of value_to_ppl_dimension.
b) we would like to place assertions warning us whenever we try to convert a dimension_type that does not fit into an OCaml integer.
I'm not sure to understand that one. Ocaml integers have always one bit less than the machine integers. How and where do you handle the case when the integer is (on 32 bits machine) bigger than 2^30, ie fits positively in a C int but not in an Ocaml int?
c) we want to strive for maximum portability (i.e., support both different computing platforms and, if possible, different OCaml releases).
Second question: recently we corrected several GC-related issues reported by Kenneth MacKenzie. There were several places where our code was not GC safe. However, there are other places where, IMHO, we are too conservative. For instance:
extern "C" CAMLprim value ppl_version_major(value unit) try { CAMLparam1(unit); CAMLreturn(Val_long(version_major())); } CATCH_ALL
Is there any need to wrap unit and the return value (an unboxed CAML int, afaict) with the calls to CAMLparam1 and CAMLreturn ?
There is no need to wrap it in that precise case (because we know that verion_major() is a constant), but I strongly recommend to keep the wrapping and stay conservative. If you don't keep the wrapping, leave a big fat comment in your code (really a big fat comment) to warn the future PPL maintainer! And I don't think any sensible Ocaml application using PPL would call that ppl_version_major function a million times (but only once at most).
Similarly, in the following example,
extern "C" CAMLprim value ppl_set_rounding_for_PPL(value unit) try { CAMLparam1(unit); set_rounding_for_PPL(); CAMLreturn(Val_unit); } CATCH_ALL
is there any need to wrap Val_unit ?
If set_rounding_for_PPL call some complex code which might apply an Ocaml closure or raise an Ocaml exception or allocate an Ocaml boxed value, you need to keep it.
My advice is to always keep the CAML* macros every where. In the rare functions which you believe are called billions of time and which you are sure never call the Ocaml GC or apply an Ocaml closure or raise an Ocaml exception even indirectly, you might remove them. IIRC, there is a nightmare scenario like your code don't call the Ocaml GC but does call new or malloc, latter the malloc-ed zone is released, still latter it is used by Ocaml GC (or is it the reverse?) and days latter Ocaml GC deallocated etc.. I forgot the details, ask Damien, but they are really ugly!
Thanks in advance, Enea.
participants (4)
-
Basile Starynkevitch
-
Basile STARYNKEVITCH
-
Enea Zaffanella
-
Roberto Bagnara