[gridengine users] HowTo Configure large mem MPI jobs to have priority over short running serial / smp jobs?

Mike Hanby Fri, 01 May 2015 19:44:04 -0700

Howdy,

I'm wondering if anyone in the SGE community has any tips on how to accomplish 
this on an SGE 6.2u5p2?


We would like to improve the balance competing of application profiles 
currently active on the cluster, in particular, we need to balance queue wait 
times between jobs that require many cores via MPI and those that use only a 
single core.

Currently, MPI and SMP jobs tend to wait longer (the wait time goes up as the 
number of slots requested) than single slot serial jobs. We have a fair share 
policy in place, but even with that, serial users starve out the large # of 
slot MPI users, especially large memory MPI jobs (ex: 64 slots at 13GB per slot 
jobs).

We currently require users to request h_rt and vf as manadatory resource 
requests. Based on our analysis of qacct, the vast majority of cluster jobs are 
serial  and run in less than an hour and use less that 1GB of RAM per core.  
This means that these jobs, once identified, can be used very effectively to 
back-fill resource gaps left by larger MPI jobs.

We have a resource quota set in place to prevent slot over subscription of the 
compute nodes:
{
   name         slotcap
   description  Keep slots equal to processor cores for all exec hosts
   enabled      TRUE
   limit        hosts {*} to slots=$num_proc
}

Here's the proposal that was passed down to me and I'm looking for suggestions 
on how to implement it:

*             create a short.q to accept jobs with run times under 2 hours and 
2G per slot memory.
This one is easy enough, create a queue: short.q and change h_rt in the queue 
definition from INFINITY to 00:02:00 and vf from INFINITY to 2G. Limit the PE 
list to smp

*             create a largempi.q to accept 64-core, large-memory MPI jobs that 
have a max runtime of 6 hours
Similar to above, largempi.q, h_rt set to 00:06:00 and vf set to 13G. Limit the 
PE list to MPI pe's
But, how do I make sure that jobs request a minimum of 8 slots to prevent 
serial jobs (i.e. no pe requested) or small parallel?

*             assign both queues to a common hardware pool that satisfies the 
resource needs of both job types (in our case, we'll use 22 nodes that each 
have 48GB and 12 slots)
Create a hostgroup containing the 22 compute nodes and assign that hostgroup to 
short.q and largempi.q using the "hostlist" option

*             set a user limit of 100 slots in short.q to prevent a single user 
from taking over the queue
Create a RQS for this:
{
   name         short_queue_limits
   description  Limit max slots for short.q
   enabled      TRUE
   limit        users {*} queues short.q to slots=100
}

Now, what am I missing to have jobs submitted to largempi.q get priority and to 
ensure that the serial jobs won't squeeze out the parallel large mem jobs.

Thanks,

Mike

_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

[gridengine users] HowTo Configure large mem MPI jobs to have priority over short running serial / smp jobs?

Reply via email to