Danny Auble wrote on Sep, 11 16:23:08:
> 
> Chris, If you follow Andy's example (Andy thanks for reporting this by 
> the way), you would see this didn't work as you would expect.
> 
> Andy was referring to the Per User limits not the MaxNodes or MaxCpus 
> generic limits.
> 
> The sacctmgr man page says
> 
> MaxNodesPerUser
> Maximum number of nodes each user is able to use.

I didn't notice the *PerUser limits. How long have those been around?
Have I just completely overlooked them? 

> 
> MaxNodes
> Maximum number of nodes each job is able to use.
> 
> The old code was not looking at the submitted job along with used 
> resources by the user.  The referred patch fixes this.  If you think the 
> documentation or code is not correct please submit a patch.
> 
> Danny
> 
> On 09/10/12 07:42, Andy Wettstein wrote:
> > On Sat, Sep 08, 2012 at 06:22:03PM -0600, Chris Scheller wrote:
> >> Andy Wettstein wrote on Sep, 07 14:33:05:
> >>> Hi,
> >>>
> >>> I'm seeing an issue with the QoS limits not being enforced. I am using
> >>> slrum 2.4. On the normal QoS I've got MaxCPUsPerUser=1024 and
> >>> MaxNodesPerUser=64. Those are the only limits besides MaxWall. There is
> >> I believe those are per job limits. You want to use the GrpCPUs and
> >> GrpNodes options instead.
> > That's not my understanding from the manual. From what I can tell
> > MaxNodes and MaxCPUs is enforced per job
> > MaxNodesPerUser and MaxCPUsPerUser is enforced for the user
> > GrpNodes and GrpCPUs is enforced for the qos
> >
> > AccountingStorageEnforce=limits,qos is set in the slurm.conf.
> >
> > I was just now able to understand how to reproduce this. It looks like I
> > can exceed the per user limits as long as my current jobs are under the
> > limits and my next to start exceeds them.
> >
> > This will help understand the problem I think:
> >
> > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > ??????[$] <> sbatch -N 63 hello1.sh
> > Submitted batch job 1732073
> > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > ??????[$] <> sbatch -N 2 hello1.sh
> > Submitted batch job 1732074
> > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > ??????[$] <> sbatch -N 2 hello1.sh
> > Submitted batch job 1732075
> > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > ??????[$] <> squeue -u wettstein
> >    JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
> > 1732075    sandyb hello1.s wettstei  PD       0:00      2 (QOSResourceLimit)
> > 1732073    sandyb hello1.s wettstei   R       0:08     63 
> > midway[043-044,046-047,050,053-074,077-093,095,097,102-103,105-112,115,119-124]
> > 1732074    sandyb hello1.s wettstei   R       0:04      2 midway[043-044]
> >
> >
> > The second job started and I was able to exceed the MaxNodesPerUser=64
> > limit. The third job didn't start because I was already over the limit.
> > It seems like the limit checking might not be taking into account the
> > number of nodes requested for the job that is being started.

-- 
Chris Scheller
Unix System Administrator
Department of Biostatistics
School of Public Health
University of Michigan
Phone: (734) 615-7439
Office: M4218

Reply via email to