Hi Joshua, How many cores do you have on your gridengine master?
Do you have any per-host quotas set? ( You need at least 2 cores for scheduling decisions to be made involving per-host/per-user quotas in a timely manner....) How large is your accounting file? Any other programs, jobs, people, accessing your accounting file heavily? Are you using the gridengine master for anything else besides scheduling like copying the common area out in cron with huge accounting files? If you are not already, try and run your scheduler on a 4 core 16 GB virtual machine with basic underlying hardware set up for at least one 10 GBit/s uplink. Is your shared nfs area slow or getting hammered by users? Just some points to help you in identifying the issues.... Ed Lauzier -----Original Message----- From: Joshua Baker-LePain [mailto:[email protected]] Sent: Friday, November 1, 2013 01:44 PM To: 'users' Subject: [gridengine users] Debugging *really* long scheduling runs I'm currently running Grid Engine 2011.11p1 on CentOS-6. I'm using classic spooling to a local disk, local $SGE_ROOT (except for $SGE_ROOT/$SGE_CELL/common), and local spooling directories on the nodes (of which there are more than 600). I'm occasionally seeing *really* long scheduling runs (the last two were 4005 and 4847 seconds). This leads to extra fun like:11/01/2013 08:35:39|event_|sortinghat|W|acknowledge timeout after 600 seconds for event client (schedd:0) on host "$SGE_MASTER"11/01/2013 08:35:39|event_|sortinghat|E|removing event client (schedd:0) on hos t "$SGE_MASTER" after acknowledge timeout from event client listI have "PROFILE=1" set, and of course most of the time is spent in "job dispatching". But I'm really not sure how else to track down the cause of this. Where should I be looking? Are there any other options I can set to get more info?Thanks.-- Joshua Baker-LePainQB3 Shared Cluster SysadminUCSF_______________________________________________ users mailing [email protected]https://gridengine.org/mailman/listinfo/users
_______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
