Bug affecting PPL 0.10 on big-endian architectures

Dear maintainers of PPL packages,
this message is to let you know about a bug in PPL 0.10 that affects some of the packages you maintain. All the information, including a simple workaround is available here:
http://www.cs.unipr.it/ppl/Bugs/
Please do not hesitate to come back to us in case you need further information. All the best,
Roberto
P.S. The bug proves that no one ever ran `make check' on PPL 0.10 on a PowerPC machine and reported back the results. If you have any idea how to make sure this does not happen again...

Dear maintainers of PPL packages,
this message is to let you know about a bug in PPL 0.10 that affects some of the packages you maintain. All the information, including a simple workaround is available here:
http://www.cs.unipr.it/ppl/Bugs/
Please do not hesitate to come back to us in case you need further information. All the best,
Roberto
Hmm, this workaround looks quite ugly, I'd rather wait for a proper fix than hacking the build script in an architecture-specific way.
In the statement on the web page you don't give any details on what actually breaks on big endian architectures. If you could provide us with some details we could file a bug with high severity, which would hopefully allow us to ship an updated version with lenny or at least with lenny r1.
P.S. The bug proves that no one ever ran `make check' on PPL 0.10 on a PowerPC machine and reported back the results. If you have any idea how to make sure this does not happen again...
We should probably not ignore errors from make check ... See, e.g., here:
http://buildd.debian.org/fetch.cgi?&pkg=ppl&ver=0.10-1&arch=powe...
There were quite a few errors, but we ignored all of them. In our packaging git repository I've got a 0.10-2 version sitting around where the errors wouldn't have been ignored, but apparently I never uploaded that one :-(
Sorry, Michael

Michael Tautschnig wrote:
Hmm, this workaround looks quite ugly, I'd rather wait for a proper fix than hacking the build script in an architecture-specific way.
Hi Michael,
you are right. A much better workaround (a trivial patch) is now available at http://www.cs.unipr.it/ppl/Bugs/
In the statement on the web page you don't give any details on what actually breaks on big endian architectures. If you could provide us with some details we could file a bug with high severity, which would hopefully allow us to ship an updated version with lenny or at least with lenny r1.
Well, the bug affects all the floating point computations of the PPL. Luckily, GCC 4.4 does not use the affected code, but other applications would (silently) obtain wrong results.
P.S. The bug proves that no one ever ran `make check' on PPL 0.10 on a PowerPC machine and reported back the results. If you have any idea how to make sure this does not happen again...
We should probably not ignore errors from make check ... See, e.g., here:
http://buildd.debian.org/fetch.cgi?&pkg=ppl&ver=0.10-1&arch=powe...
There were quite a few errors, but we ignored all of them. In our packaging git repository I've got a 0.10-2 version sitting around where the errors wouldn't have been ignored, but apparently I never uploaded that one :-(
I see. It is very important `make check' is run on every architecture you care about. Particularly for those we do not have access to (such as ppc and ppc64). All the best,
Roberto

Hi all,
[...]
I see. It is very important `make check' is run on every architecture you care about. Particularly for those we do not have access to (such as ppc and ppc64). All the best,
I've just uploaded 0.10-2 which includes the bugfix and has regression tests properly enabled while not discarding the result. I'll see whether that works on all architectures and will let you know in case some further problem pops up.
Best, Michael

Michael Tautschnig wrote:
I've just uploaded 0.10-2 which includes the bugfix and has regression tests properly enabled while not discarding the result. I'll see whether that works on all architectures and will let you know in case some further problem pops up.
Great, thanks!

Hi all,
P.S. The bug proves that no one ever ran `make check' on PPL 0.10 on a PowerPC machine and reported back the results. If you have any idea how to make sure this does not happen again...
Finally ppl 0.10-2 has made its way through nearly all the build daemons (armel is missing, but arm was ok), and only a single check fails, and it only fails on alpha: generalizedaffineimage2 (from Octagonal_Shape) results in an uncaught exception. See also http://buildd.debian.org/build.php?&pkg=ppl&ver=0.10-2&arch=alph...
Do you have access to an alpha machine or can you tell from the source what could be going wrong there? Of course, it may also be an architecture-specific compiler problem ...
If you need any help, just contact me.
Best, Michael

Michael Tautschnig ha scritto:
Hi all,
P.S. The bug proves that no one ever ran `make check' on PPL 0.10 on a PowerPC machine and reported back the results. If you have any idea how to make sure this does not happen again...
Finally ppl 0.10-2 has made its way through nearly all the build daemons (armel is missing, but arm was ok), and only a single check fails, and it only fails on alpha: generalizedaffineimage2 (from Octagonal_Shape) results in an uncaught exception. See also http://buildd.debian.org/build.php?&pkg=ppl&ver=0.10-2&arch=alph...
Do you have access to an alpha machine or can you tell from the source what could be going wrong there? Of course, it may also be an architecture-specific compiler problem ...
If you need any help, just contact me.
We've tried hard, but we've been unable to reproduce this behavior using stock PPL 0.10 configured as found in your log file:
configure --build alpha-linux-gnu --host alpha-linux-gnu --enable-interfaces=c,cxx --disable-ppl_lpsol --disable-ppl_lcdd CFLAGS="-Wall -g -O2"
It would be very interesting to check with gdb where the untrapped exception has been generated.
Here below you find info about testing machine (which is the only alpha we have access to).
$ cat /proc/cpuinfo cpu : Alpha cpu model : EV56 cpu variation : 7 cpu revision : 0 cpu serial number : system type : Rawhide system variation : Tincup system revision : 0 system serial number : AY74642662 cycle frequency [Hz] : 399638195 est. timer frequency [Hz] : 1200.00 page size [bytes] : 8192 phys. address bits : 40 max. addr. space # : 127 BogoMIPS : 738.12 kernel unaligned acc : 0 (pc=0,va=0) user unaligned acc : 17680637 (pc=20b13c08,va=783d2004) platform string : AlphaServer 1200 5/400 4MB cpus detected : 1 cpus active : 1 cpu active mask : 0000000000000001 L1 Icache : 8K, 1-way, 32b line L1 Dcache : 8K, 1-way, 32b line L2 cache : 96K, 3-way, 64b line L3 cache : 4096K, 1-way, 64b line $ cat /proc/version Linux version 2.6.18-6-alpha-smp (Debian 2.6.18.dfsg.1-18etch6) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri Jun 6 23:28:12 UTC 2008 $ g++ -v Using built-in specs. Target: alphaev56-unknown-linux-gnu Configured with: ../gcc-4.3.2/configure --prefix=/opt/cfarm/release/4.3.2 --enable-languages=c,c++ --enable-__cxa_atexit --disable-nls --enable-threads=posix --disable-multilib --with-mpfr=/opt/cfarm/mpfr-2.3.1 Thread model: posix gcc version 4.3.2 (GCC)

Abramo Bagnara wrote:
Michael Tautschnig ha scritto:
Hi all,
P.S. The bug proves that no one ever ran `make check' on PPL 0.10 on a PowerPC machine and reported back the results. If you have any idea how to make sure this does not happen again...
Finally ppl 0.10-2 has made its way through nearly all the build daemons (armel is missing, but arm was ok), and only a single check fails, and it only fails on alpha: generalizedaffineimage2 (from Octagonal_Shape) results in an uncaught exception. See also http://buildd.debian.org/build.php?&pkg=ppl&ver=0.10-2&arch=alph...
Do you have access to an alpha machine or can you tell from the source what could be going wrong there? Of course, it may also be an architecture-specific compiler problem ...
If you need any help, just contact me.
We've tried hard, but we've been unable to reproduce this behavior using stock PPL 0.10 configured as found in your log file:
configure --build alpha-linux-gnu --host alpha-linux-gnu --enable-interfaces=c,cxx --disable-ppl_lpsol --disable-ppl_lcdd CFLAGS="-Wall -g -O2"
It would be very interesting to check with gdb where the untrapped exception has been generated.
Here below you find info about testing machine (which is the only alpha we have access to).
$ cat /proc/cpuinfo cpu : Alpha cpu model : EV56 cpu variation : 7 cpu revision : 0 cpu serial number : system type : Rawhide system variation : Tincup system revision : 0 system serial number : AY74642662 cycle frequency [Hz] : 399638195 est. timer frequency [Hz] : 1200.00 page size [bytes] : 8192 phys. address bits : 40 max. addr. space # : 127 BogoMIPS : 738.12 kernel unaligned acc : 0 (pc=0,va=0) user unaligned acc : 17680637 (pc=20b13c08,va=783d2004) platform string : AlphaServer 1200 5/400 4MB cpus detected : 1 cpus active : 1 cpu active mask : 0000000000000001 L1 Icache : 8K, 1-way, 32b line L1 Dcache : 8K, 1-way, 32b line L2 cache : 96K, 3-way, 64b line L3 cache : 4096K, 1-way, 64b line $ cat /proc/version Linux version 2.6.18-6-alpha-smp (Debian 2.6.18.dfsg.1-18etch6) (dannf@debian.org) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Fri Jun 6 23:28:12 UTC 2008 $ g++ -v Using built-in specs. Target: alphaev56-unknown-linux-gnu Configured with: ../gcc-4.3.2/configure --prefix=/opt/cfarm/release/4.3.2 --enable-languages=c,c++ --enable-__cxa_atexit --disable-nls --enable-threads=posix --disable-multilib --with-mpfr=/opt/cfarm/mpfr-2.3.1 Thread model: posix gcc version 4.3.2 (GCC)
Hi Michael,
would it be possible to access a machine where the problem manifests itself? Alternatively, can someone run that test program under GDB and tell us which exception is thrown? All the best,
Roberto

[...]
would it be possible to access a machine where the problem manifests itself? Alternatively, can someone run that test program under GDB and tell us which exception is thrown? All the best,
I will take care of that, but it may take me a few days until I find the time. I'll report back as soon as possible!
Best, Michael

[...]
would it be possible to access a machine where the problem manifests itself? Alternatively, can someone run that test program under GDB and tell us which exception is thrown? All the best,
I will take care of that, but it may take me a few days until I find the time. I'll report back as soon as possible!
It seems to be a problem with the compiler optimizations of g++ 4.2 on alpha: The execption (::std::invalid_argument) is thrown in test11 of generalizedaffineimage3.cc as expected, but the catch clauses don't do what they should do when compiled using -O2. It's fine with -O0 and this bug seems to be solved in g++ 4.3. I'll refrain from reporting a bug against the gcc package and rather add a hack that enforces the use of 4.3 when compiling the tests.
I'm testing this hack at the moment and will upload 0.10-3 to unstable once the build succeeds.
Sorry for the noise, Michael

[...]
would it be possible to access a machine where the problem manifests itself? Alternatively, can someone run that test program under GDB and tell us which exception is thrown? All the best,
I will take care of that, but it may take me a few days until I find the time. I'll report back as soon as possible!
It seems to be a problem with the compiler optimizations of g++ 4.2 on alpha: The execption (::std::invalid_argument) is thrown in test11 of generalizedaffineimage3.cc as expected, but the catch clauses don't do what they should do when compiled using -O2. It's fine with -O0 and this bug seems to be solved in g++ 4.3. I'll refrain from reporting a bug against the gcc package and rather add a hack that enforces the use of 4.3 when compiling the tests.
I'm testing this hack at the moment and will upload 0.10-3 to unstable once the build succeeds.
It seems that there are also problems with exception handling in g++ 4.3, but in all cases test11 of generalizedaffineimage3 is affected (with g++ 4.3 it fails at the instance mpq_class, with 4.2 it failed with double). I will thus have the preprocessor disable this test if __alpha__ is defined.
Best, Michael
participants (3)
-
Abramo Bagnara
-
Michael Tautschnig
-
Roberto Bagnara