Hi, > Am 27.01.2015 um 09:26 schrieb Ursula Winkler <[email protected]>: > > On 01/26/2015 10:03 PM, Reuti wrote: >>> I'll trying to find a solution for an environment running serial jobs as >>> well as mpi jobs on >>> 6 hosts where each host has 32 cores/slots. Due to the small number of >>> nodes, assigning >>> each sort of jobs to separate nodes (e.g. nodes 1-2 for serial, nodes 3-6 >>> for mpi jobs) is >>> not an option, expecially because the ratio serial:mpi is quite a variable >>> one. >>> >>> I tried out to set up 2 queues with "serial" as a subordinate queue to >>> "mpi". - But that >>> only is unwasteful if the mpi job(s) use ~ 32 slots per host. Otherwise >>> there are serial >>> jobs which could run but persist unnecessarily in a suspended state due to >>> the fact >>> that the whole queue "serial" is suspended. >>> >>> The other possible option should be the subordination of slots, but that >>> doesn't work either >>> because the scheduler obviously (concerning subordination) is not capable >>> of figuring out how many slots a mpi job actually is requesting, and so >>> suspends stubbornly only one serial job - >>> which of course causes core oversubscription. >>> >>> Has somebody an idea to solve this problem in a satisfying way? >> Why not submitting all jobs to one and the same queue? >> >> It might be good to provide a suitable: >> >> $ qconf -ssconf >> ... >> max_reservation 20 >> default_duration 8760:00:00 >> >> and submit the parallel jobs with "-R y" to avoid starvation. To use the >> backfilling in a proper way a value h_rt needs to be provided too during >> submission. >> >> -- Reuti > > Hi, > > I hoped I could avoid that. On all the other clusters we have separated nodes > for each queue and that works fine without runtime limitations/requestions. I > wanted to provide the same (usage) conditions also on the new cluster, but > ok, if it should not be...
And you are submitting to queues then (this would be more Torque-style submissions)? You could also use hostgroups to have different parts in the cluster which you can address. What was the idea to have different queues for different parts of the clusters? To me it sounds like the setup of the new cluster (shared usage per exechosts) is different from the goal in the other clusters where only a single queue was set up on each machine (resp. a single queue for each part of the cluster) -- Reuti > I'll set the configuration changes as you proposed and see how it works. > Thank you, Reuti. > Ursula > > _______________________________________________ > users mailing list > [email protected] > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list [email protected] https://gridengine.org/mailman/listinfo/users
