Danny Auble wrote on Sep, 11 16:23:08: > > Chris, If you follow Andy's example (Andy thanks for reporting this by > the way), you would see this didn't work as you would expect. > > Andy was referring to the Per User limits not the MaxNodes or MaxCpus > generic limits. > > The sacctmgr man page says > > MaxNodesPerUser > Maximum number of nodes each user is able to use.
I didn't notice the *PerUser limits. How long have those been around? Have I just completely overlooked them? > > MaxNodes > Maximum number of nodes each job is able to use. > > The old code was not looking at the submitted job along with used > resources by the user. The referred patch fixes this. If you think the > documentation or code is not correct please submit a patch. > > Danny > > On 09/10/12 07:42, Andy Wettstein wrote: > > On Sat, Sep 08, 2012 at 06:22:03PM -0600, Chris Scheller wrote: > >> Andy Wettstein wrote on Sep, 07 14:33:05: > >>> Hi, > >>> > >>> I'm seeing an issue with the QoS limits not being enforced. I am using > >>> slrum 2.4. On the normal QoS I've got MaxCPUsPerUser=1024 and > >>> MaxNodesPerUser=64. Those are the only limits besides MaxWall. There is > >> I believe those are per job limits. You want to use the GrpCPUs and > >> GrpNodes options instead. > > That's not my understanding from the manual. From what I can tell > > MaxNodes and MaxCPUs is enforced per job > > MaxNodesPerUser and MaxCPUsPerUser is enforced for the user > > GrpNodes and GrpCPUs is enforced for the qos > > > > AccountingStorageEnforce=limits,qos is set in the slurm.conf. > > > > I was just now able to understand how to reproduce this. It looks like I > > can exceed the per user limits as long as my current jobs are under the > > limits and my next to start exceeds them. > > > > This will help understand the problem I think: > > > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > > ??????[$] <> sbatch -N 63 hello1.sh > > Submitted batch job 1732073 > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > > ??????[$] <> sbatch -N 2 hello1.sh > > Submitted batch job 1732074 > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > > ??????[$] <> sbatch -N 2 hello1.sh > > Submitted batch job 1732075 > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16] > > ??????[$] <> squeue -u wettstein > > JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) > > 1732075 sandyb hello1.s wettstei PD 0:00 2 (QOSResourceLimit) > > 1732073 sandyb hello1.s wettstei R 0:08 63 > > midway[043-044,046-047,050,053-074,077-093,095,097,102-103,105-112,115,119-124] > > 1732074 sandyb hello1.s wettstei R 0:04 2 midway[043-044] > > > > > > The second job started and I was able to exceed the MaxNodesPerUser=64 > > limit. The third job didn't start because I was already over the limit. > > It seems like the limit checking might not be taking into account the > > number of nodes requested for the job that is being started. -- Chris Scheller Unix System Administrator Department of Biostatistics School of Public Health University of Michigan Phone: (734) 615-7439 Office: M4218
