Hello, >> { >> name shortlimit >> description NONE >> enabled TRUE >> limit queues short.q hosts * to slots=32
> I think you can leave the "hosts *" out here and the other RQS below. It > means "used slots across all machines" limited to 32 in this queue. The same > can be achieved by specifying only the queue. Yes, I ended up making some things overly explicit while trying to debug the issue. >> } >> { >> name longlimit >> description NONE >> enabled TRUE >> limit queues long.q hosts * to slots=16 >> } >> { >> name verylonglimit >> description NONE >> enabled TRUE >> limit queues verylong.q hosts * to slots=4 >> } >> { >> name urgentlimit >> description NONE >> enabled TRUE >> limit users {*} queues urgent.q hosts * to slots=1 >> } >> { >> name debuglimit >> description NONE >> enabled TRUE >> limit users {*} queues debug.q hosts {*} to slots=1 >> } >As the above 5 limits are disjunct, they can also be put in one and the same >RQS. You can give each a name to get it listed instead of the number of the >rule, which is always 1 right now. I originally had these as one RQS, but again tried to make things more explicit (or at least easier for me to understand) while debugging. >> This will cause a parallel job across multiple queues to never schedule. If >> I get rid of the "nodelimit" and instead set the number of slots using >> the complex value in the host configuration, then everything works (except >> my debug queue). >Do you have many machinetypes? What happens, if you don't use $num_proc there >but specify a hard coded limit per hostgroup for a machinetype or so? > >limit queues !debug.q hosts {@quadcore} to slots=4 >limit queues !debug.q hosts {@hexacore} to slots=6 I don't have many machine types, in fact I don't have many machines! I tried to replace the nodelimit RQS with: { name nodelimit description NONE enabled TRUE limit queues !debug.q hosts {animal.ohsu.edu,kermit.ohsu.edu} to slots=24 limit queues !debug.q hosts {piggy.ohsu.edu} to slots=8 } This gives the same result as the original nodelimit RQS that used $num_proc (the job never gets scheduled). >> Below I give an example of a hanging job (with the scheduler output enabled). >> I set h_rt to 3:50:00 as this will allow the queues short.q, long.q, and >> verylong.q. I request 40 slots as that will have to span multiple queues. >If I get you right, SGE could find different combinations for the slot >allocation, depending on the algorithm which is used as all the queues are on >the same machines? All the queues are on the same machines. I am not sure which "algorithm" you refer to. As mentioned, the scheduler sorts by sequence number so the queues are checked in shortest to longest order. Thus my job that requests 40 slots with the given h_rt value should take 32 slots from short.q and 8 slots from long.q (provided nothing else is running on the cluster, which is the case for my testing). Thanks, Brendan _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users