Hi,

I see in our config (qconf -sconf)

qmaster_params               MAX_DYN_EC=500 gdi_timeout=120 gdi_retries=4

Have you tried fiddling with the MAX_DYN_EC parameter?

Regards,
Alex

On 06/12/2015 02:44 PM, Grau, Federico (NIH/NHGRI) [C] wrote:
Hello gridengine users,

Our organization has been running SoGE 8.1.0 for a couple years, and
recently upgraded to SoGE 8.1.8 last week.  We’ve started seeing “event
client” errors reporting “Only 950 event clients are allowed in the
system”.  However, displaying the list of active event clients “qconf
-secl", we only ever see a handful of active event clients; similarly,
we only see a couple connections when reviewing “qping -info fooserver
1234 qmaster 1”.  It seems that there is a problem releasing completed
event client resources, almost like a “leak”.  Reviewing logs, we also
noted an increase in “event client reregistered” error messages.

# sample reregister messages entry

|worker|fooserver|E|event client "qsub" (foobazcompute/qsub/5702)
reregistered - it will need a total update

Our current band aid is to restart qmaster every couple days.  We’re
working to isolate recent changes in the environment (pacbio
smrtanalysis code was upgraded to  2.3.0p4; we had made some local perl
library changes from using qrsh to “qsub -sync", which we’re now
temporarily reverting).  Any input per this bug or possible approaches
to further diagnose it are welcome.

regards,

--

Federico Grau

Sr. Linux Administrator

NIH/NHGRI Contractor – Digicon

m: 240-506-7993

[email protected] <mailto:[email protected]>



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to