[lucas@lucas-nussbaum.net: Bug#552959: ppl: FTBFS: build blocks]

Hi all,
I was notified about some very strange build failure that I cannot reproduce myself, but apparently for Lucas it _is_ reproducible. Have you by chance already come across this problem, or have some idea what could be the problem, or maybe even have a fix already in git?
Thanks in advance, Michael
----- Forwarded message from Lucas Nussbaum lucas@lucas-nussbaum.net -----
Date: Fri, 30 Oct 2009 12:03:59 +0100 From: Lucas Nussbaum lucas@lucas-nussbaum.net To: Michael Tautschnig mt@debian.org CC: 552959@bugs.debian.org Subject: Bug#552959: ppl: FTBFS: build blocks Reply-To: Lucas Nussbaum lucas@lucas-nussbaum.net, 552959@bugs.debian.org User-Agent: Mutt/1.5.20 (2009-06-14)
On 29/10/09 at 09:02 +0100, Michael Tautschnig wrote:
Hi!
[...]
if [ . != `pwd` ]; then \ rm -f ppl_prolog_generated_test_common.pl; \ fi rm -f ppl_prolog_generated_test_main.pl; \ diff -u --ignore-all-space ./../tests/expected_pgt obtained_pgt make[7]: *** [pl_check_test] Terminated make[3]: *** [check-recursive] Terminated E: Caught signal 'Terminated': terminating immediately make[5]: *** [check-recursive] Terminated make[4]: *** [check] Terminated make[2]: *** [check] Terminated make[1]: *** [check-recursive] Terminated make: *** [check] Terminated make[6]: *** [check-am] Terminated Build killed with signal TERM after 240 minutes of inactivity ──────────────────────────────────────────────────────────────────────────────── Build finished at 20091028-0518
The full build log is available from: http://people.debian.org/~lucas/logs/2009/10/28/ppl_0.10.2-3_lsid64.buildlog
[...]
Is it possible to schedule another build? I'm really clueless what could be going wrong here, other than some problem with the buildd, which seems somewhat more likely given the excerpt of daemon.log provided at the end of this log. Looking at other build logs the above diff seems to be about the last thing before install, and it worked fine on all Debian buildds just a few days ago!?
The fact that it blocks is reproducible.
Output on the terminal: % ppl_prolog_generated_test_main.pl compiled 0.18 sec, 2,099,112 bytes % ./swi_prolog_generated_test compiled 0.18 sec, 2,104,128 bytes true.
true.
% halt if [ . != `pwd` ]; then \ rm -f ppl_prolog_generated_test_common.pl; \ fi rm -f ppl_prolog_generated_test_main.pl; \ diff -u --ignore-all-space ./../tests/expected_pgt obtained_pgt
However, it's not diff that is blocking, it's ppl_pl. It's eating all the available memory and causes swapping. I've attached the output of ps. the status of the process is: Name: ppl_pl State: T (stopped) Tgid: 20160 Pid: 20160 PPid: 20152 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize: 256 Groups: 0 VmPeak: 303031364 kB VmSize: 303031364 kB VmLck: 0 kB VmHWM: 31854660 kB VmRSS: 31756644 kB VmData: 302960040 kB VmStk: 84 kB VmExe: 5284 kB VmLib: 4792 kB VmPTE: 100764 kB Threads: 1 SigQ: 1/270336 SigPnd: 0000000000000000 ShdPnd: 0000000000000000 SigBlk: 0000000000000000 SigIgn: 0000000000001000 SigCgt: 0000000187802083 CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff Cpus_allowed: 000000ff Cpus_allowed_list: 0-7 Mems_allowed: 00000000,00000001 Mems_allowed_list: 0 voluntary_ctxt_switches: 6796 nonvoluntary_ctxt_switches: 2400
(the T state is normal, I kill'ed -STOP it so I could get the ps output, but it was R or D before that.

Michael Tautschnig wrote:
I was notified about some very strange build failure that I cannot reproduce myself, but apparently for Lucas it _is_ reproducible. Have you by chance already come across this problem, or have some idea what could be the problem, or maybe even have a fix already in git?
Hi Michael,
this week we are all busy, but next week we will look in to this. Can you give us all the information required to try to reproduce the problem (which version of Debian, which architecture, which version of SWI-Prolog, ...). Please forgive me if that information is already present somewhere in the attachment you sent... as I said, this week we are snowed under. All the best,
Roberto

Michael Tautschnig wrote:
I was notified about some very strange build failure that I cannot reproduce myself, but apparently for Lucas it _is_ reproducible. Have you by chance already come across this problem, or have some idea what could be the problem, or maybe even have a fix already in git?
Hi Michael,
this week we are all busy, but next week we will look in to this. Can you give us all the information required to try to reproduce the problem (which version of Debian, which architecture, which version of SWI-Prolog, ...). Please forgive me if that information is already present somewhere in the attachment you sent... as I said, this week we are snowed under. All the best,
Sorry for responding this late; the versions should be as follows:
- Debian sid/unstable - Builds running on amd64 - SWI-Prolog version 5.6.64
Now I'm building ppl on such systems as well, so I'm really puzzled about this. The full build log is available at [1]. What seems noteworthy to me is:
- make is run with -j10, meaning highly parallel. - The first invocation of ppl_pl seems to block; now I didn't look into the corresponding makefiles yet, but could it be some simple dependency problem that only occurs upon parallel builds? The blocking part is the ppl_pl execution around here:
make[7]: Entering directory `/build/user-ppl_0.10.2-3-amd64-s4Cp4e/ppl-0.10.2/interfaces/Prolog/SWI' if [ . != `pwd` ]; then \ cp -f ./../tests/pl_check.pl . ; \ fi ;\ echo "ensure_loaded('./swi_pl_check'). main." > script_pchk /bin/bash ../../../libtool --mode=execute \ -dlopen ../../../src/libppl.la \ -dlopen ../../../Watchdog/src/libpwl.la \ -dlopen libppl_swiprolog.la \ ./ppl_pl < script_pchk
HTH, Michael
[1] http://people.debian.org/~lucas/logs/2009/10/28/ppl_0.10.2-3_lsid64.buildlog

Michael Tautschnig wrote:
Michael Tautschnig wrote:
I was notified about some very strange build failure that I cannot reproduce myself, but apparently for Lucas it _is_ reproducible. Have you by chance already come across this problem, or have some idea what could be the problem, or maybe even have a fix already in git?
Hi Michael,
this week we are all busy, but next week we will look in to this. Can you give us all the information required to try to reproduce the problem (which version of Debian, which architecture, which version of SWI-Prolog, ...). Please forgive me if that information is already present somewhere in the attachment you sent... as I said, this week we are snowed under. All the best,
Sorry for responding this late; the versions should be as follows:
- Debian sid/unstable
- Builds running on amd64
- SWI-Prolog version 5.6.64
Now I'm building ppl on such systems as well, so I'm really puzzled about this. The full build log is available at [1]. What seems noteworthy to me is:
- make is run with -j10, meaning highly parallel.
- The first invocation of ppl_pl seems to block; now I didn't look into the corresponding makefiles yet, but could it be some simple dependency problem that only occurs upon parallel builds? The blocking part is the ppl_pl execution around here:
make[7]: Entering directory `/build/user-ppl_0.10.2-3-amd64-s4Cp4e/ppl-0.10.2/interfaces/Prolog/SWI' if [ . != `pwd` ]; then \ cp -f ./../tests/pl_check.pl . ; \ fi ;\ echo "ensure_loaded('./swi_pl_check'). main." > script_pchk /bin/bash ../../../libtool --mode=execute \ -dlopen ../../../src/libppl.la \ -dlopen ../../../Watchdog/src/libpwl.la \ -dlopen libppl_swiprolog.la \ ./ppl_pl < script_pchk
HTH, Michael
[1] http://people.debian.org/~lucas/logs/2009/10/28/ppl_0.10.2-3_lsid64.buildlog
Hi Michael,
I am sorry about the delay, but I am snowed under other committments. We routinely test the PPL with make -j (for j in the range [2, 8]) so I am pretty sure parallelism is not the cause. I have no access now to the machine where I have a Debian VM: I will have to wait for the weekend for that).
Israel, perhaps you can help us debugging this problem? All the best,
Roberto

Excerpts from Roberto Bagnara's message of Wed Nov 18 08:51:24 +0100 2009:
I am sorry about the delay, but I am snowed under other committments. We routinely test the PPL with make -j (for j in the range [2, 8]) so I am pretty sure parallelism is not the cause. I have no access now to the machine where I have a Debian VM: I will have to wait for the weekend for that).
Israel, perhaps you can help us debugging this problem?
Yes, I saw the bug report in Debian, and I did some tests, but I am also puzzled about this bug.
The problem only appears on amd64 architectures. Unfortunately I don't have access to an amd64 machine.
However, I think that there are two workarounds that can be tried out here:
- The first one is building the package without running the tests (I think that problem appears only when running the tests). I am not sure if this could be a Debian policy violation or not (I guess not).
- The other one is trying to recompile gmp with -fexceptions and then ppl again to see if the problem persists. For details, see http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=552959#25
Interestingly, this problem does not always appear:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=552959#35
That is someone reporting that the package built successfully on amd64.
So weird thing.
Israel
participants (3)
-
Israel Herraiz
-
Michael Tautschnig
-
Roberto Bagnara