On 01/26/2015 10:03 PM, Reuti wrote:
Hi,
I'll trying to find a solution for an environment running serial jobs as well
as mpi jobs on
6 hosts where each host has 32 cores/slots. Due to the small number of nodes,
assigning
each sort of jobs to separate nodes (e.g. nodes 1-2 for serial, nodes 3-6 for
mpi jobs) is
not an option, expecially because the ratio serial:mpi is quite a variable one.
I tried out to set up 2 queues with "serial" as a subordinate queue to "mpi". -
But that
only is unwasteful if the mpi job(s) use ~ 32 slots per host. Otherwise there
are serial
jobs which could run but persist unnecessarily in a suspended state due to the
fact
that the whole queue "serial" is suspended.
The other possible option should be the subordination of slots, but that
doesn't work either
because the scheduler obviously (concerning subordination) is not capable of
figuring out how many slots a mpi job actually is requesting, and so suspends
stubbornly only one serial job -
which of course causes core oversubscription.
Has somebody an idea to solve this problem in a satisfying way?
Why not submitting all jobs to one and the same queue?
It might be good to provide a suitable:
$ qconf -ssconf
...
max_reservation 20
default_duration 8760:00:00
and submit the parallel jobs with "-R y" to avoid starvation. To use the
backfilling in a proper way a value h_rt needs to be provided too during submission.
-- Reuti
Hi,
I hoped I could avoid that. On all the other clusters we have separated
nodes for each queue and that works fine without runtime
limitations/requestions. I wanted to provide the same (usage) conditions
also on the new cluster, but ok, if it should not be...
I'll set the configuration changes as you proposed and see how it works.
Thank you, Reuti.
Ursula
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users