[slurm-dev] Re: QoS limit issues

Chris Scheller Wed, 12 Sep 2012 06:02:09 -0700

So MaxCpusPerUser works like I want instead of using grpcpus on
associations which will save me some work (my bad for missing that in
this thread.) Why are MaxNodesPerUser and MaxCpusPerUser not in the
default output of 'sacctmgr show qos'?


Chris Scheller wrote on Sep, 12 05:57:04:
> 
> Danny Auble wrote on Sep, 11 16:23:08:
> > 
> > Chris, If you follow Andy's example (Andy thanks for reporting this by 
> > the way), you would see this didn't work as you would expect.
> > 
> > Andy was referring to the Per User limits not the MaxNodes or MaxCpus 
> > generic limits.
> > 
> > The sacctmgr man page says
> > 
> > MaxNodesPerUser
> > Maximum number of nodes each user is able to use.
> 
> I didn't notice the *PerUser limits. How long have those been around?
> Have I just completely overlooked them? 
> 
> > 
> > MaxNodes
> > Maximum number of nodes each job is able to use.
> > 
> > The old code was not looking at the submitted job along with used 
> > resources by the user.  The referred patch fixes this.  If you think the 
> > documentation or code is not correct please submit a patch.
> > 
> > Danny
> > 
> > On 09/10/12 07:42, Andy Wettstein wrote:
> > > On Sat, Sep 08, 2012 at 06:22:03PM -0600, Chris Scheller wrote:
> > >> Andy Wettstein wrote on Sep, 07 14:33:05:
> > >>> Hi,
> > >>>
> > >>> I'm seeing an issue with the QoS limits not being enforced. I am using
> > >>> slrum 2.4. On the normal QoS I've got MaxCPUsPerUser=1024 and
> > >>> MaxNodesPerUser=64. Those are the only limits besides MaxWall. There is
> > >> I believe those are per job limits. You want to use the GrpCPUs and
> > >> GrpNodes options instead.
> > > That's not my understanding from the manual. From what I can tell
> > > MaxNodes and MaxCPUs is enforced per job
> > > MaxNodesPerUser and MaxCPUsPerUser is enforced for the user
> > > GrpNodes and GrpCPUs is enforced for the qos
> > >
> > > AccountingStorageEnforce=limits,qos is set in the slurm.conf.
> > >
> > > I was just now able to understand how to reproduce this. It looks like I
> > > can exceed the per user limits as long as my current jobs are under the
> > > limits and my next to start exceeds them.
> > >
> > > This will help understand the problem I think:
> > >
> > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > > ??????[$] <> sbatch -N 63 hello1.sh
> > > Submitted batch job 1732073
> > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > > ??????[$] <> sbatch -N 2 hello1.sh
> > > Submitted batch job 1732074
> > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > > ??????[$] <> sbatch -N 2 hello1.sh
> > > Submitted batch job 1732075
> > > ??????[wettstein@midway-login2] - [~/mpi] - [Mon Sep 10, 09:16]
> > > ??????[$] <> squeue -u wettstein
> > >    JOBID PARTITION     NAME     USER  ST       TIME  NODES 
> > > NODELIST(REASON)
> > > 1732075    sandyb hello1.s wettstei  PD       0:00      2 
> > > (QOSResourceLimit)
> > > 1732073    sandyb hello1.s wettstei   R       0:08     63 
> > > midway[043-044,046-047,050,053-074,077-093,095,097,102-103,105-112,115,119-124]
> > > 1732074    sandyb hello1.s wettstei   R       0:04      2 midway[043-044]
> > >
> > >
> > > The second job started and I was able to exceed the MaxNodesPerUser=64
> > > limit. The third job didn't start because I was already over the limit.
> > > It seems like the limit checking might not be taking into account the
> > > number of nodes requested for the job that is being started.
> 
> -- 
> Chris Scheller
> Unix System Administrator
> Department of Biostatistics
> School of Public Health
> University of Michigan
> Phone: (734) 615-7439
> Office: M4218
-- 
Chris Scheller
Unix System Administrator
Department of Biostatistics
School of Public Health
University of Michigan
Phone: (734) 615-7439
Office: M4218

[slurm-dev] Re: QoS limit issues

Reply via email to