Andy Wettstein wrote on Sep, 10 09:01:02: > > On Sat, Sep 08, 2012 at 06:22:03PM -0600, Chris Scheller wrote: > > > > Andy Wettstein wrote on Sep, 07 14:33:05: > > > > > > Hi, > > > > > > I'm seeing an issue with the QoS limits not being enforced. I am using > > > slrum 2.4. On the normal QoS I've got MaxCPUsPerUser=1024 and > > > MaxNodesPerUser=64. Those are the only limits besides MaxWall. There is > > > > I believe those are per job limits. You want to use the GrpCPUs and > > GrpNodes options instead. > > That's not my understanding from the manual. From what I can tell > MaxNodes and MaxCPUs is enforced per job > MaxNodesPerUser and MaxCPUsPerUser is enforced for the user > GrpNodes and GrpCPUs is enforced for the qos
True unless you apply the grpcpus/grpnodes to the user association level. I do this to limit the total number of cores a single user can use overall their jobs. Kinda annoying to have to apply to the user level but has the intended effect. > > AccountingStorageEnforce=limits,qos is set in the slurm.conf. > > I was just now able to understand how to reproduce this. It looks like I > can exceed the per user limits as long as my current jobs are under the > limits and my next to start exceeds them. > > This will help understand the problem I think: > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > ??????[$] <> sbatch -N 63 hello1.sh > Submitted batch job 1732073 > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > ??????[$] <> sbatch -N 2 hello1.sh > Submitted batch job 1732074 > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > ??????[$] <> sbatch -N 2 hello1.sh > Submitted batch job 1732075 > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > ??????[$] <> squeue -u wettstein > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > 1732075 sandyb hello1.s wettstei PD 0:00 2 (QOSResourceLimit) > 1732073 sandyb hello1.s wettstei R 0:08 63 > midway[043-044,046-047,050,053-074,077-093,095,097,102-103,105-112,115,119-124] > 1732074 sandyb hello1.s wettstei R 0:04 2 midway[043-044] > > > The second job started and I was able to exceed the MaxNodesPerUser=64 > limit. The third job didn't start because I was already over the limit. > It seems like the limit checking might not be taking into account the > number of nodes requested for the job that is being started. -- Chris Scheller Unix System Administrator Department of Biostatistics School of Public Health University of Michigan Phone: (734) 615-7439 Office: M4218
