When some of our users run their leaky code on our cluster, it triggers the Linux kernel's Out of Memory process killer. Usually this works good and kills the offending process with no other problems, but it appears that sometimes gmond also dies at the same time. The kernel does not log that it has killed any gmond processes, and usually only a few out of the 8 threads die or become zombies, so I don't think it is the Out of Memory killer that is killing gmond. Has anyone else noticed a problem like this? Could there be something internal to gmond that does not handle low memory conditions? It appears that only the threads that read /proc and send the multicast data die, but the threads that receive the data and respond to xml requests appear to continue to run.
~Jason -- /------------------------------------------------------------------\ | Jason A. Smith Email: [EMAIL PROTECTED] | | Atlas Computing Facility, Bldg. 510M Phone: (631)344-4226 | | Brookhaven National Lab, P.O. Box 5000 Fax: (631)344-7616 | | Upton, NY 11973-5000 | \------------------------------------------------------------------/