Unfortunately Slurm will need modification to do what you ask. It's straightforward work (everything would go into the function _run_now in the module src/plugins/select/cons_res), but I have no idea when it might happen.
Quoting Mike Donahue <[email protected]>: > Hi, > > I'm relatively new to SLURM, but I'm trying to come up with a > prototype SLURM configuration for our particular needs. One of our > goals is to setup low and high priority partitions, such that jobs > submitted to the high priority partition will preempt jobs running > in the lower priority partition. I was able to get this basic > functionality to work fairly easily, using > /*PreemptType=preempt/partition_pr*//*i*//*o*/. This works fine when > I have > */SelectType=select/cons_res/**//*and > */SelectTypeParameters=CR_CPU/**/. /*Preemption occurs as expected > when the number of available CPUs is the limiting resource. > > However, in our situation, our license pool is by far the most > limiting resource. We have many more CPUs available than licenses. > The simple */License /*specification in the slurm.conf file seems to > work well to model a central pool of arbitrary license resources, > with jobs pending when license resources are fully utilized, until > running jobs which have requested the license complete and > relinquish the resource. However, our real goal is to have running > jobs preempted by higher priority jobs when licenses are the > limiting resource. If we use > /*PreemptType=preempt/partition_pr*//*i*//*o , */jobs submitted to > the high priority partition will only preempt running jobs when CPU > resources are fully utilized. > > I tried several experiments defining QOS specifications to try to > model the license pool, rather than using the */License > /*specification. These included creating a high and low priority > QOS, each with a limited number of total jobs available to users of > the QOS, which would mimic the license pool, and changing to > /*PreemptType=preempt/qos*/. Once everything was setup correctly, I > could get jobs submitted to the high priority QOS to preempt running > jobs submitted to the lower priority QOS. However, the artificial > job count limit I'd setup for each QOS was not really a shared pool > of job slots, but a separate count for each QOS. Trying to combine > the > the */License /*specification with the high and low priority QOS did > not seem to help things. > > I also tried setting up a common job count limit in acctmgr, equal > to the number of license resources, for the one and only "account" > we've defined. This seems to act effectively as a common limit for > all jobs submitted to any queue. Still, with > /*PreemptType=preempt/partition_pr*//*i*//*o, and > */*/SelectType=select/cons_res/*/*, > */*/SelectTypeParameters=CR_CPU/*, jobs submitted to the higher > priority partition would only preempt other jobs when the total > number of available CPUs was "consumed". > > No matter what I try, it seems that the /*License */resource is sort > of a second-class criteria, and is really taken into account last in > the section process, and not at all for the preemption process. > Ideally, it it would be desirable that licenses could be able to be > promoted to the level of a "consumable resource" that would be > considered by the preemption algorithm. > > Any suggestions would be appreciated! > > One note: I'm using simple scripts which execute the "sleep" > command as my test vehicle. As such, these jobs use hardly any CPU > bandwidth or memory resources. Not sure if this could skew the > behavior of the scheduler and/or preemption algorithms. > > We are currently using SLURM release 2.4.4. > > Thanks, > Mike Donahue > >
