On Sat, Sep 08, 2012 at 06:22:03PM -0600, Chris Scheller wrote: > > Andy Wettstein wrote on Sep, 07 14:33:05: > > > > Hi, > > > > I'm seeing an issue with the QoS limits not being enforced. I am using > > slrum 2.4. On the normal QoS I've got MaxCPUsPerUser=1024 and > > MaxNodesPerUser=64. Those are the only limits besides MaxWall. There is > > I believe those are per job limits. You want to use the GrpCPUs and > GrpNodes options instead.
That's not my understanding from the manual. From what I can tell MaxNodes and MaxCPUs is enforced per job MaxNodesPerUser and MaxCPUsPerUser is enforced for the user GrpNodes and GrpCPUs is enforced for the qos AccountingStorageEnforce=limits,qos is set in the slurm.conf. I was just now able to understand how to reproduce this. It looks like I can exceed the per user limits as long as my current jobs are under the limits and my next to start exceeds them. This will help understand the problem I think: ┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] └─[$] <> sbatch -N 63 hello1.sh Submitted batch job 1732073 ┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] └─[$] <> sbatch -N 2 hello1.sh Submitted batch job 1732074 ┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] └─[$] <> sbatch -N 2 hello1.sh Submitted batch job 1732075 ┌─[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] └─[$] <> squeue -u wettstein JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1732075 sandyb hello1.s wettstei PD 0:00 2 (QOSResourceLimit) 1732073 sandyb hello1.s wettstei R 0:08 63 midway[043-044,046-047,050,053-074,077-093,095,097,102-103,105-112,115,119-124] 1732074 sandyb hello1.s wettstei R 0:04 2 midway[043-044] The second job started and I was able to exceed the MaxNodesPerUser=64 limit. The third job didn't start because I was already over the limit. It seems like the limit checking might not be taking into account the number of nodes requested for the job that is being started.
