
On Wed, 4 Feb 2009, Abramo Bagnara wrote:
with that two checks to avoid calls to mpz_addmul and mpz_submul permits to avoid 110958265 calls to mpz_addmul (94.3%) and 50079219 calls to mpz_submul (88,6%).
Indeed those percentages are the interesting data. For such a distribution it is obviously more efficient to inline the frequent trivial case.
This show me that the inlining of some trivial tests can give good results when the test is true very often.
Ideally, this would be the compilers job. It would need: - link time optimization, to have access to the gmp code when compiling your program. Several compilers can do this, and maybe gcc-4.5 - profile feedback, to know about this distribution - partial inlining or arbitrary de-inlining
The last point seems like the hardest to find in a compiler, and writing the function as:
mpz_mul(...){ if(test for 0){trivial case} else{call mpz_mul_nonzero} } mpz_mul_nonzero(...){current mpz_mul code, minus the check for 0}
may help the compiler a lot.
- the GMP library does not check for any special values unless this is
needed for correctness and the user code is responsible to add the trivial tests that increase efficiency for specific data set
- the GMP library move the trivial checks in gmp.h to permit their
inlining. Note that this approach does not conflict with 1: it would be feasible to have mpz_addmul defined in gmp.h and mpz_addmul_raw defined in GMP library, then the user code could call the version more suitable for its needs, the important part is that library version does not check for special values.
Indeed, with this approach, the compiler does not even need LTO or profiling. However, this is making the code a bit heavier by forcing those inlines for every operation, even for codes where the numbers are never 0. Don't know how much that matters.
- (current situation) the GMP library does some special value checks
and user code may decide to inline the very same checks. This makes the 'else' path heavier than needed.
I believe that the check for 0 in mpz_mul actually falls in your first category, it is necessary for correctness (however, mpz_mul also checks whether one of the arguments fits in a single limb, which fits your description better). And the else path is supposedly already heavy enough that the penalty should not be too bad. Did you see a significant performance gain using 2) instead of 3)?
(I am not a gmp developer, just another user, so feel free to ignore my questions if you think the answers won't help convince the real developers)