Apparently the sys admin did not include (at least) that package on the
server build.
It is included on our client builds though.
I've never used it before, but I ran it as Rayson instructed.
In the terminal where it was running, after it crashed, all it said was:

(gdb) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x47df6940 (LWP 2193)]
0x000000000055c698 in lCopySwitchPack ()
(gdb) 

which I guess we already knew.  I'll look at monit and getting a
coredump.
We had added 'ulimit -c unlimited' to the master init.d script in hopes
of getting a dump that way, but no such luck.
It's probably all a moot point anyway since it appears to be a confirmed
bug with 6.2u5.
>From my perspective (grid admin) I hope my management's meeting with
Univa on Friday will be fruitful :-)
Thanks to all for the help.

--murph

-----Original Message-----
From: Dave Love [mailto:[email protected]] 
Sent: Monday, May 02, 2011 6:49 PM
To: Murphy, Brian (E IT F 45)
Cc: [email protected]
Subject: Re: [gridengine users] Qmaster Failing

"Murphy, Brian (E IT F 45)" <[email protected]> writes:

> Thanks Rayson.
> Downloaded, compiled and installed GDB 7.2.

[What's wrong with he system gdb?]

Running under gdb isn't so useful in this situation, where you're trying
to keep qmaster up going.  (I used monit, rather than cron, in the same
situation.)  You can get a post mortem core dump using
http://arc.liv.ac.uk/repos/darcs/sge/source/libs/libcore/, per
https://arc.liv.ac.uk/trac/SGE/ticket/507.  However, it's probably not
very useful anyway unless what you're running was compiled with
debugging information, and it would be easier just to try a patched
version.

I have RH 5 rpms that have been in production, but you probably won't
want to run binaries from random sources, and presumably you'll get a
fix from Univa.

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to