Re: [slurm-users] Slurm 19's "Changed the default fair share algorithm to "fair tree".": implications for slurm.conf PriorityFlags setting

2019-07-15 Thread Chris Samuel
On 15/7/19 1:40 am, Kevin Buckley wrote: Does that mean that having this setting in slurm.conf PriorityFlags=FAIR_TREE is now redundant, because it's the default ? Yes, I would believe so. Of course if you wanted to be explicit about your choice you could just leave it set to that. All

[slurm-users] MPI job fails with more than 1 node: "Failed to send temp kvs to compute nodes"

2019-07-15 Thread Cao, Lei
Hi, I am running slurm version 19.05.0 and openmpi version 3.1.4. Openmpi is configured with pmi2 from slurm. Whenever I tried to run an mpi job with more than 1 node, I have this error message: srun: error: mpi/pmi2: failed to send temp kvs to compute nodes srun: Job step aborted:

Re: [slurm-users] [External] Re: Invalid qos specification

2019-07-15 Thread Prentice Bisbal
I found the problem. It was between the chair and keyboard: $ salloc -p general -q qos -t 00:30:00 When I type the qos right, it works: $ salloc -p general -q debug -t 00:30:00 -A unix salloc: Granted job allocation 529343 $ scontrol  show job 529343 | grep QOS    Priority=13736 Nice=0

Re: [slurm-users] [External] Re: Invalid qos specification

2019-07-15 Thread Prentice Bisbal
$ scontrol show part general PartitionName=general    AllowGroups=ALL AllowAccounts=ALL AllowQos=general,debug    AllocNodes=ALL Default=YES QoS=general    DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=300 Hidden=NO    MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO

Re: [slurm-users] Invalid qos specification

2019-07-15 Thread David Rhey
I ran into this recently. You need to make sure your user account has access to that QoS through sacctmgr. Right now I'd say if you did sacctmgr show user withassoc that the QoS you're attempting to use is NOT listed as part of the association. On Mon, Jul 15, 2019 at 2:53 PM Prentice Bisbal

Re: [slurm-users] Invalid qos specification

2019-07-15 Thread Christopher Samuel
On 7/15/19 11:22 AM, Prentice Bisbal wrote: $ salloc -p general -q debug  -t 00:30:00 salloc: error: Job submit/allocate failed: Invalid qos specification what does: scontrol show part general say? Also, does the user you're testing as have access to that QOS? All the best, Chris --

Re: [slurm-users] Invalid qos specification

2019-07-15 Thread Prentice Bisbal
I should add that I still get this error even when I remove the "AllowQOS" attribute from the partition definition altogether: $ salloc -p general -q debug  -t 00:30:00 salloc: error: Job submit/allocate failed: Invalid qos specification Prentice On 7/15/19 2:22 PM, Prentice Bisbal wrote:

[slurm-users] Invalid qos specification

2019-07-15 Thread Prentice Bisbal
Slurm users, I have created a partition named general should allow the QOSes 'general' and 'debug': PartitionName=general Default=YES AllowQOS=general,debug Nodes=. However, when I try to request that QOS, I get an error: $ salloc -p general -q debug  -t 00:30:00 salloc: error: Job

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-15 Thread Mark Hahn
Could it be a RHEL7 specific issue? no - centos7 systems here, and pam_adopt works. [hahn@gra799 ~]$ cat /proc/self/cgroup 11:memory:/slurm/uid_3000566/job_17268219/step_extern 10:net_prio,net_cls:/ 9:pids:/ 8:perf_event:/ 7:hugetlb:/ 6:blkio:/

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-15 Thread Christopher Samuel
On 7/12/19 6:21 AM, Juergen Salk wrote: I suppose this is nevertheless the expected behavior and just the way it is when using pam_slurm_adopt to restrict access to the compute nodes? Is that right? Or did I miss something obvious? Could it be a RHEL7 specific issue? It looks like it's

Re: [slurm-users] pam_slurm_adopt and memory constraints?

2019-07-15 Thread Andy Georges
Hi Juergen, On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote: > Dear all, > > I have configured pam_slurm_adopt in our Slurm test environment by > following the corresponding documentation: > > https://slurm.schedmd.com/pam_slurm_adopt.html > > I've set `PrologFlags=contain´ in

[slurm-users] Slurm 19's "Changed the default fair share algorithm to "fair tree".": implications for slurm.conf PriorityFlags setting

2019-07-15 Thread Kevin Buckley
In the RELEASE_NOTES for 19.05.1-2, we read HIGHLIGHTS == ... -- Changed the default fair share algorithm to "fair tree". To disable this and revert to "classic" fair share you must set PriorityFlags=NO_FAIR_TREE. ... Does that mean that having this setting in slurm.conf

[slurm-users] ANNOUNCE: A new showpartitions tool

2019-07-15 Thread Ole Holm Nielsen
Getting an overview of available Slurm partitions and their current job load is a non-trivial task. The great "spart" tool described as "A user-oriented partition info command for slurm" (https://github.com/mercanca/spart) written by Ahmet Mercan solves this problem. The "spart" tool is