Hi,
I'm runnning a big workflow using qmake.

Yesterday night the workflow was 'frozen'. I pressed Ctrl-C to re-run the analysis

Now, qmake doesn't work any more, there is no message on stdout/stderr; The exit code seems to be '139'
qmake still works with some other 'Makefiles'.


$ qmake -cwd -v PATH -l arch=lx24-amd64 -- -j 50 -n all ; echo $status
139

If if run the classical 'make', It prints the commands to be done:

$ make -n all

  mkdir -p ../align20140124/Samples/Sample1/VCF/ALL/ && \
gunzip -c ../align20140124/Samples//Sample1/VCF/ALL//Sample1.vcf.gz | \ awk -F ' ' '/^#/ {print;next;} {OFS=" ";gsub(/,/,".",$6); if($6!="." && $6<0) $6=0; print;}' |\
       (...)


Some core dumps have been generated:

$gdb --core=core.55770
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-60.el6_4.1)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
[New Thread 55770]
Core was generated by `qmake -inherit -cwd -v PATH -l arch=lx24-amd64 -- -j 50 -n all'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000412fb1 in ?? ()
(gdb) bt
#0  0x0000000000412fb1 in ?? ()
#1  0x000000000238f150 in ?? ()
#2  0x0000000000000000 in ?? ()


here are some other informations but I don't know they are related:

$ dmesg

Out of memory: Kill process 21010 (hapHunt) score 975 or sacrifice child
Killed process 21010, UID 502, (hapHunt) total-vm:104669264kB, anon-rss:96930100kB, file-rss:4kB
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
samtools[28071]: segfault at 30 ip 000000000043f8d8 sp 00007fff1485f550 error 4 in samtools[400000+60000]
udev: starting version 147
nfsd: last server has exited, flushing export cache
NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory
NFSD: starting 90-second grace period
circo[26747]: segfault at 4 ip 0000003c2b60f6f1 sp 00007fffdebb83f0 error 4 in libcairo.so.2.10800.8[3c2b600000+76000] dot[4620]: segfault at 4 ip 0000003c2b60f6f1 sp 00007fff701d59c0 error 4 in libcairo.so.2.10800.8[3c2b600000+76000]
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
[drm:output_poll_execute] *ERROR* delayed enqueue failed -125
samtools[30216] trap divide error ip:40630d sp:7fffd0178dd0 error:0 in samtools[400000+60000]


in /path/to/sge_root/name/spool/qmaster/messages

01/29/2014 09:48:31|worker|master|W|job 1171361.1 failed on host node04 assumedly after job because: job 1171361.1 died through signal SEGV (11)


any idea how to fix this ?

Thank you,

Pierre

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to