Hi,
I'm runnning a big workflow using qmake.
Yesterday night the workflow was 'frozen'. I pressed Ctrl-C to re-run
the analysis
Now, qmake doesn't work any more, there is no message on stdout/stderr;
The exit code seems to be '139'
qmake still works with some other 'Makefiles'.
$ qmake -cwd -v PATH -l arch=lx24-amd64 -- -j 50 -n all ; echo $status
139
If if run the classical 'make', It prints the commands to be done:
$ make -n all
mkdir -p ../align20140124/Samples/Sample1/VCF/ALL/ && \
gunzip -c
../align20140124/Samples//Sample1/VCF/ALL//Sample1.vcf.gz | \
awk -F ' ' '/^#/ {print;next;} {OFS="
";gsub(/,/,".",$6); if($6!="." && $6<0) $6=0; print;}' |\
(...)
Some core dumps have been generated:
$gdb --core=core.55770
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
[New Thread 55770]
Core was generated by `qmake -inherit -cwd -v PATH -l arch=lx24-amd64 --
-j 50 -n all'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000412fb1 in ?? ()
(gdb) bt
#0 0x0000000000412fb1 in ?? ()
#1 0x000000000238f150 in ?? ()
#2 0x0000000000000000 in ?? ()
here are some other informations but I don't know they are related:
$ dmesg
Out of memory: Kill process 21010 (hapHunt) score 975 or sacrifice child
Killed process 21010, UID 502, (hapHunt) total-vm:104669264kB,
anon-rss:96930100kB, file-rss:4kB
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
samtools[28071]: segfault at 30 ip 000000000043f8d8 sp 00007fff1485f550
error 4 in samtools[400000+60000]
udev: starting version 147
nfsd: last server has exited, flushing export cache
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
circo[26747]: segfault at 4 ip 0000003c2b60f6f1 sp 00007fffdebb83f0
error 4 in libcairo.so.2.10800.8[3c2b600000+76000]
dot[4620]: segfault at 4 ip 0000003c2b60f6f1 sp 00007fff701d59c0 error 4
in libcairo.so.2.10800.8[3c2b600000+76000]
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
samtools[30216] trap divide error ip:40630d sp:7fffd0178dd0 error:0 in
samtools[400000+60000]
in /path/to/sge_root/name/spool/qmaster/messages
01/29/2014 09:48:31|worker|master|W|job 1171361.1 failed on host node04
assumedly after job because: job 1171361.1 died through signal SEGV (11)
any idea how to fix this ?
Thank you,
Pierre
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users