Hello gridengine users,

Our organization has been running SoGE 8.1.0 for a couple years, and recently 
upgraded to SoGE 8.1.8 last week.  We've started seeing "event client" errors 
reporting "Only 950 event clients are allowed in the system".  However, 
displaying the list of active event clients "qconf -secl", we only ever see a 
handful of active event clients; similarly, we only see a couple connections 
when reviewing "qping -info fooserver 1234 qmaster 1".  It seems that there is 
a problem releasing completed event client resources, almost like a "leak".  
Reviewing logs, we also noted an increase in "event client reregistered" error 
messages.

# sample reregister messages entry
|worker|fooserver|E|event client "qsub" (foobazcompute/qsub/5702) reregistered 
- it will need a total update

Our current band aid is to restart qmaster every couple days.  We're working to 
isolate recent changes in the environment (pacbio smrtanalysis code was 
upgraded to  2.3.0p4; we had made some local perl library changes from using 
qrsh to "qsub -sync", which we're now temporarily reverting).  Any input per 
this bug or possible approaches to further diagnose it are welcome.



regards,

--

Federico Grau
Sr. Linux Administrator
NIH/NHGRI Contractor - Digicon
m: 240-506-7993
[email protected]<mailto:[email protected]>





--

Federico Grau
Sr. Linux Administrator
NIH/NHGRI Contractor - Digicon
m: 240-506-7993
[email protected]<mailto:[email protected]>



_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to