[slurm-dev] Re: QoS limit issues

Andy Wettstein Mon, 10 Sep 2012 07:38:09 -0700

On Sat, Sep 08, 2012 at 06:22:03PM -0600, Chris Scheller wrote:
> 
> Andy Wettstein wrote on Sep, 07 14:33:05:
> > 
> > Hi,
> > 
> > I'm seeing an issue with the QoS limits not being enforced. I am using
> > slrum 2.4. On the normal QoS I've got MaxCPUsPerUser=1024 and
> > MaxNodesPerUser=64. Those are the only limits besides MaxWall. There is
> 
> I believe those are per job limits. You want to use the GrpCPUs and
> GrpNodes options instead.


That's not my understanding from the manual. From what I can tell
MaxNodes and MaxCPUs is enforced per job
MaxNodesPerUser and MaxCPUsPerUser is enforced for the user
GrpNodes and GrpCPUs is enforced for the qos

AccountingStorageEnforce=limits,qos is set in the slurm.conf.

I was just now able to understand how to reproduce this. It looks like I
can exceed the per user limits as long as my current jobs are under the
limits and my next to start exceeds them. 

This will help understand the problem I think:

┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
└─[$] <> sbatch -N 63 hello1.sh
Submitted batch job 1732073
┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
└─[$] <> sbatch -N 2 hello1.sh
Submitted batch job 1732074
┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
└─[$] <> sbatch -N 2 hello1.sh
Submitted batch job 1732075
┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
└─[$] <> squeue -u wettstein
  JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
1732075    sandyb hello1.s wettstei  PD       0:00      2 (QOSResourceLimit)
1732073    sandyb hello1.s wettstei   R       0:08     63 
midway[043-044,046-047,050,053-074,077-093,095,097,102-103,105-112,115,119-124]
1732074    sandyb hello1.s wettstei   R       0:04      2 midway[043-044]


The second job started and I was able to exceed the MaxNodesPerUser=64
limit. The third job didn't start because I was already over the limit.
It seems like the limit checking might not be taking into account the
number of nodes requested for the job that is being started.

[slurm-dev] Re: QoS limit issues

Reply via email to