>> All the queues are on the same machines. I am not sure which "algorithm" you >> refer to. > >I refer to the internal algorithm of SGE how to collect slots from various >queues. > >> As mentioned, the scheduler sorts by sequence number so the queues are >> checked in shortest to longest order. > >Not for parallel jobs. Only the allocation_rule is used (except for $pe_slots). > >http://blogs.oracle.com/sgrell/entry/grid_engine_scheduler_hacks_least > >Does your observation fit to the aspects of parallel jobs at the end of the >above link?
There is definitely still some interaction between the scheduler configuration and the pe allocation rule. The allocation rule for the "mpi" pe is $round_robin. When I run this example successfully (the per node slot limits done through complex values) then the grid engine will do round robin allocation in short.q (animal and kermit get 12 slots, piggy gets 8) followed by round robin allocation in long.q (animal and kermit get 4 slots). >Interesting. Collecting slots from different queues has some implications >anyway: > >- the name of the $TMPDIR depends on the name of the queue, hence it's not the >same on all nodes This should not be an issue for correctly written software, right? >- `qrsh -inherit ...` can't distinguish between the granted queues: >https://arc.liv.ac.uk/trac/SGE/ticket/813 I don't think this will affect us. We only run MPI programs with a tightly integrated MPICH2 or SMP programs with the allocation rule set to $pe_slots. So is it safe to say that I have found a bug? It seems like my original RQS should work. Or at least doing qsub with '-w e' should fail immediately instead of allowing the job to wait in 'qw' state forever. Thanks, Brendan _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users