Hello,
we've started to have a problem with the qmaster process on one of my
SGE cells.
Basically, it starts to eat up large amounts of memory (and then dies).
Seemed to happen more or less out of the blue (i.e. running fine for a
while, suddenly stops - in fact, it ran fine for a couple of months, and
we had this happen for the first time about a month ago). However, a
couple of tests I ran today seem to indicate that it's the number of
jobs being submitted that triggers it (as in, we managed to trigger it
simply by submitting a lot of jobs in a short space of time).
Unfortunately, a lot of jobs being submitted in a short space of time is
our standard use case :)
This is on SGE8.1.3.
I did make me think of the old 'schedd_job_info can cause immense memory
consumption' - as I do currently collect job info - but I thought that
that was fixed? Should that be fixed in SGE8.1.3?
Tina
--
Tina Friedrich, Computer Systems Administrator, Diamond Light Source Ltd
Diamond House, Harwell Science and Innovation Campus - 01235 77 8442
--
This e-mail and any attachments may contain confidential, copyright and or
privileged material, and are for the use of the intended addressee only. If you
are not the intended addressee or an authorised recipient of the addressee
please notify us of receipt by returning the e-mail and do not use, copy,
retain, distribute or disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and
Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users