On 15/7/19 1:40 am, Kevin Buckley wrote:
Does that mean that having this setting in slurm.conf
PriorityFlags=FAIR_TREE
is now redundant, because it's the default ?
Yes, I would believe so. Of course if you wanted to be explicit about
your choice you could just leave it set to that.
All
Hi,
I am running slurm version 19.05.0 and openmpi version 3.1.4. Openmpi is
configured with pmi2 from slurm. Whenever I tried to run an mpi job with more
than 1 node, I have this error message:
srun: error: mpi/pmi2: failed to send temp kvs to compute nodes
srun: Job step aborted:
I found the problem. It was between the chair and keyboard:
$ salloc -p general -q qos -t 00:30:00
When I type the qos right, it works:
$ salloc -p general -q debug -t 00:30:00 -A unix
salloc: Granted job allocation 529343
$ scontrol show job 529343 | grep QOS
Priority=13736 Nice=0
$ scontrol show part general
PartitionName=general
AllowGroups=ALL AllowAccounts=ALL AllowQos=general,debug
AllocNodes=ALL Default=YES QoS=general
DefaultTime=NONE DisableRootJobs=NO ExclusiveUser=NO GraceTime=300
Hidden=NO
MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO
I ran into this recently. You need to make sure your user account has
access to that QoS through sacctmgr. Right now I'd say if you did sacctmgr
show user withassoc that the QoS you're attempting to use is NOT
listed as part of the association.
On Mon, Jul 15, 2019 at 2:53 PM Prentice Bisbal
On 7/15/19 11:22 AM, Prentice Bisbal wrote:
$ salloc -p general -q debug -t 00:30:00
salloc: error: Job submit/allocate failed: Invalid qos specification
what does:
scontrol show part general
say?
Also, does the user you're testing as have access to that QOS?
All the best,
Chris
--
I should add that I still get this error even when I remove the
"AllowQOS" attribute from the partition definition altogether:
$ salloc -p general -q debug -t 00:30:00
salloc: error: Job submit/allocate failed: Invalid qos specification
Prentice
On 7/15/19 2:22 PM, Prentice Bisbal wrote:
Slurm users,
I have created a partition named general should allow the QOSes
'general' and 'debug':
PartitionName=general Default=YES AllowQOS=general,debug Nodes=.
However, when I try to request that QOS, I get an error:
$ salloc -p general -q debug -t 00:30:00
salloc: error: Job
Could it be a RHEL7 specific issue?
no - centos7 systems here, and pam_adopt works.
[hahn@gra799 ~]$ cat /proc/self/cgroup
11:memory:/slurm/uid_3000566/job_17268219/step_extern
10:net_prio,net_cls:/
9:pids:/
8:perf_event:/
7:hugetlb:/
6:blkio:/
On 7/12/19 6:21 AM, Juergen Salk wrote:
I suppose this is nevertheless the expected behavior and just the way
it is when using pam_slurm_adopt to restrict access to the compute
nodes? Is that right? Or did I miss something obvious?
Could it be a RHEL7 specific issue?
It looks like it's
Hi Juergen,
On Fri, Jul 12, 2019 at 03:21:31PM +0200, Juergen Salk wrote:
> Dear all,
>
> I have configured pam_slurm_adopt in our Slurm test environment by
> following the corresponding documentation:
>
> https://slurm.schedmd.com/pam_slurm_adopt.html
>
> I've set `PrologFlags=contain´ in
In the RELEASE_NOTES for 19.05.1-2, we read
HIGHLIGHTS
==
...
-- Changed the default fair share algorithm to "fair tree". To disable this
and revert to "classic" fair share you must set PriorityFlags=NO_FAIR_TREE.
...
Does that mean that having this setting in slurm.conf
Getting an overview of available Slurm partitions and their current job
load is a non-trivial task.
The great "spart" tool described as "A user-oriented partition info
command for slurm" (https://github.com/mercanca/spart) written by Ahmet
Mercan solves this problem. The "spart" tool is
13 matches
Mail list logo