Re: [gridengine users] HowTo Configure large mem MPI jobs to have priority over short running serial / smp jobs?

Reuti Sat, 02 May 2015 03:27:55 -0700

Hi,

Am 02.05.2015 um 04:39 schrieb Mike Hanby:


> Howdy,
>  
> I'm wondering if anyone in the SGE community has any tips on how to 
> accomplish this on an SGE 6.2u5p2?
>  
> We would like to improve the balance competing of application profiles 
> currently active on the cluster, in particular, we need to balance queue wait 
> times between jobs that require many cores via MPI and those that use only a 
> single core.
>  
> Currently, MPI and SMP jobs tend to wait longer (the wait time goes up as the 
> number of slots requested) than single slot serial jobs. We have a fair share 
> policy in place, but even with that, serial users starve out the large # of 
> slot MPI users, especially large memory MPI jobs (ex: 64 slots at 13GB per 
> slot jobs).
>  
> We currently require users to request h_rt and vf as manadatory resource 
> requests. Based on our analysis of qacct, the vast majority of cluster jobs 
> are serial  and run in less than an hour and use less that 1GB of RAM per 
> core.  This means that these jobs, once identified, can be used very 
> effectively to back-fill resource gaps left by larger MPI jobs.

Are the parallel and/or large memory jobs submitted with resource reservation 
("-R y")? Is "max_reservation" in the scheduler set to a value greater than 
zero?


> We have a resource quota set in place to prevent slot over subscription of 
> the compute nodes:
> {
>    name         slotcap
>    description  Keep slots equal to processor cores for all exec hosts
>    enabled      TRUE
>    limit        hosts {*} to slots=$num_proc
> }
>  
> Here's the proposal that was passed down to me and I'm looking for 
> suggestions on how to implement it:
>  
> •             create a short.q to accept jobs with run times under 2 hours 
> and 2G per slot memory.
> This one is easy enough, create a queue: short.q and change h_rt in the queue 
> definition from INFINITY to 00:02:00 and vf from INFINITY to 2G. Limit the PE 
> list to smp
>  
> •             create a largempi.q to accept 64-core, large-memory MPI jobs 
> that have a max runtime of 6 hours
> Similar to above, largempi.q, h_rt set to 00:06:00 and vf set to 13G. Limit 
> the PE list to MPI pe's
> But, how do I make sure that jobs request a minimum of 8 slots to prevent 
> serial jobs (i.e. no pe requested) or small parallel?

You can use a JSV to add a queue request to the submitted jobs. Unfortunately 
there is nowhere in SGE some kind of minimum request facility to end up on any 
queue/host.

-- Reuti


>  
> •             assign both queues to a common hardware pool that satisfies the 
> resource needs of both job types (in our case, we'll use 22 nodes that each 
> have 48GB and 12 slots)
> Create a hostgroup containing the 22 compute nodes and assign that hostgroup 
> to short.q and largempi.q using the "hostlist" option
>  
> •             set a user limit of 100 slots in short.q to prevent a single 
> user from taking over the queue
> Create a RQS for this:
> {
>    name         short_queue_limits
>    description  Limit max slots for short.q
>    enabled      TRUE
>    limit        users {*} queues short.q to slots=100
> }
>  
> Now, what am I missing to have jobs submitted to largempi.q get priority and 
> to ensure that the serial jobs won't squeeze out the parallel large mem jobs.
>  
> Thanks,
>  
> Mike
> _______________________________________________
> users mailing list
> [email protected]
> https://gridengine.org/mailman/listinfo/users


_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Re: [gridengine users] HowTo Configure large mem MPI jobs to have priority over short running serial / smp jobs?

Reply via email to