>> All the queues are on the same machines. I am not sure which "algorithm" you 
>> refer to.
>
>I refer to the internal algorithm of SGE how to collect slots from various 
>queues.
>
>> As mentioned, the scheduler sorts by sequence number so the queues are 
>> checked in shortest to longest order.
>
>Not for parallel jobs. Only the allocation_rule is used (except for $pe_slots).
>
>http://blogs.oracle.com/sgrell/entry/grid_engine_scheduler_hacks_least
>
>Does your observation fit to the aspects of parallel jobs at the end of the 
>above link?

There is definitely still some interaction between the scheduler configuration 
and the pe allocation rule. The allocation rule for the "mpi" pe is 
$round_robin. When I run this example successfully (the per node slot limits 
done through complex values) then the grid engine will do round robin 
allocation in short.q (animal and kermit get 12 slots, piggy gets 8) followed 
by round robin allocation in long.q (animal and kermit get 4 slots).

>Interesting. Collecting slots from different queues has some implications 
>anyway:
>
>- the name of the $TMPDIR depends on the name of the queue, hence it's not the 
>same on all nodes

This should not be an issue for correctly written software, right?

>- `qrsh -inherit ...` can't distinguish between the granted queues:
>https://arc.liv.ac.uk/trac/SGE/ticket/813

I don't think this will affect us. We only run MPI programs with a tightly 
integrated MPICH2 or SMP programs with the allocation rule set to $pe_slots.

So is it safe to say that I have found a bug? It seems like my original RQS 
should work. Or at least doing qsub with '-w e' should fail immediately instead 
of allowing the job to wait in 'qw' state forever.

Thanks,
Brendan
_______________________________________________
users mailing list
users@gridengine.org
https://gridengine.org/mailman/listinfo/users

Reply via email to