Hello gridengine users, Our organization has been running SoGE 8.1.0 for a couple years, and recently upgraded to SoGE 8.1.8 last week. We've started seeing "event client" errors reporting "Only 950 event clients are allowed in the system". However, displaying the list of active event clients "qconf -secl", we only ever see a handful of active event clients; similarly, we only see a couple connections when reviewing "qping -info fooserver 1234 qmaster 1". It seems that there is a problem releasing completed event client resources, almost like a "leak". Reviewing logs, we also noted an increase in "event client reregistered" error messages.
# sample reregister messages entry |worker|fooserver|E|event client "qsub" (foobazcompute/qsub/5702) reregistered - it will need a total update Our current band aid is to restart qmaster every couple days. We're working to isolate recent changes in the environment (pacbio smrtanalysis code was upgraded to 2.3.0p4; we had made some local perl library changes from using qrsh to "qsub -sync", which we're now temporarily reverting). Any input per this bug or possible approaches to further diagnose it are welcome. regards, -- Federico Grau Sr. Linux Administrator NIH/NHGRI Contractor - Digicon m: 240-506-7993 [email protected]<mailto:[email protected]> -- Federico Grau Sr. Linux Administrator NIH/NHGRI Contractor - Digicon m: 240-506-7993 [email protected]<mailto:[email protected]>
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
