Re: [slurm-users] Strange error, submission denied

2019-02-20 Thread Marcus Wagner
ahh, ... one thing, I forgot. The following is working again ... --ntasks=24 --ntasks-per-node=24    NumNodes=1 NumCPUs=48 NumTasks=24 CPUs/Task=1 ReqB:S:C:T=0:0:*:*    TRES=cpu=48,mem=12M,energy=63,node=1,billing=48    Socks/Node=* NtasksPerN:B:S:C=24:0:*:1 CoreSpec=*    MinCPUsNode=24 MinM

Re: [slurm-users] Strange error, submission denied

2019-02-20 Thread Marcus Wagner
Hi Andreas, I'll try to sum this up ;) First of all, I used now a Broadwell node, so there is no interference with Skylake and SubNuma clustering. We are using slurm 18.08.5-2 I have configured the node as slurmd -C tells me: NodeName=lnm596  Sockets=2 CoresPerSocket=12 ThreadsPerCor

Re: [slurm-users] Strange error, submission denied

2019-02-20 Thread Prentice Bisbal
On 2/20/19 12:08 AM, Marcus Wagner wrote: Hi Prentice, On 2/19/19 2:58 PM, Prentice Bisbal wrote: --ntasks-per-node is meant to be used in conjunction with --nodes option. From https://slurm.schedmd.com/sbatch.html: *--ntasks-per-node*= Request that /ntasks/ be invoked on each node.

Re: [slurm-users] Strange error, submission denied

2019-02-20 Thread Henkel
Hi Chris, Hi Marcus, Just want to understand the cause, too. I'll try to sum it up. Chris you have CPUs=80 Boards=1 SocketsPerBoard=2 CoresPerSocket=20 ThreadsPerCore=2 and srun -C gpu -N 1 --ntasks-per-node=80 hostname works. Marcus has configured CPUs=48  Sockets=4 CoresPerSocket=12 Threa

Re: [slurm-users] Strange error, submission denied

2019-02-20 Thread Marcus Wagner
Dear all, I did a little bit more testing. * I have reenabled CR_ONE_TASK_PER_CORE. * My testnode is still configured, as slurmd -C tells me. * "--ntasks=24" or "--ntasks=24 --ntasks-per-node=24" can both be submitted, resulting in a job with the "free" hyperthread per task. Nearly perfect.

Re: [slurm-users] Strange error, submission denied

2019-02-20 Thread Marcus Wagner
Hi Chris, I assume, you have not set CR_ONE_TASK_PER_CORE CR_ONE_TASK_PER_CORE     Allocate one task per core by default. Without this option, by default one task will be allocated per thread on nodes with more than one ThreadsPerCore configured. 

Re: [slurm-users] Strange error, submission denied

2019-02-19 Thread Chris Samuel
On Tuesday, 19 February 2019 10:14:21 PM PST Marcus Wagner wrote: > sbatch -N 1 --ntasks-per-node=48 --wrap hostname > submission denied, got jobid 199805 On one of our 40 core nodes with 2 hyperthreads: $ srun -C gpu -N 1 --ntasks-per-node=80 hostname | uniq -c 80 nodename02 The spec is:

Re: [slurm-users] Strange error, submission denied

2019-02-19 Thread Marcus Wagner
I just made a little bit debugging, setting the debug level to debug5 during submission. I submitted (or at least tried to) two jobs: sbatch -n 48 --wrap hostname got submitted, got jobid 199801 sbatch -N 1 --ntasks-per-node=48 --wrap hostname submission denied, got jobid 199805 The only diff

Re: [slurm-users] Strange error, submission denied

2019-02-19 Thread Marcus Wagner
Hi Prentice, On 2/19/19 2:58 PM, Prentice Bisbal wrote: --ntasks-per-node is meant to be used in conjunction with --nodes option. From https://slurm.schedmd.com/sbatch.html: *--ntasks-per-node*= Request that /ntasks/ be invoked on each node. If used with the *--ntasks* option, the

Re: [slurm-users] Strange error, submission denied

2019-02-19 Thread Prentice Bisbal
--ntasks-per-node is meant to be used in conjunction with --nodes option. From https://slurm.schedmd.com/sbatch.html: *--ntasks-per-node*= Request that /ntasks/ be invoked on each node. If used with the *--ntasks* option, the *--ntasks* option will take precedence and the *--ntasks-

Re: [slurm-users] Strange error, submission denied

2019-02-17 Thread Marcus Wagner
No, but that was expected ;) Thanks nonetheless. Best Marcus On 2/18/19 6:01 AM, Andreas Henkel wrote: Not the answer you hoped for there I guess... On 15.02.19 07:15, Marcus Wagner wrote: I have filed a bug: https://bugs.schedmd.com/show_bug.cgi?id=6522 Lets see, what ScheMD has to tell

Re: [slurm-users] Strange error, submission denied

2019-02-17 Thread Andreas Henkel
Not the answer you hoped for there I guess... On 15.02.19 07:15, Marcus Wagner wrote: > I have filed a bug: > > https://bugs.schedmd.com/show_bug.cgi?id=6522 > > > Lets see, what ScheMD has to tell us ;) > > > Best > Marcus > > On 2/15/19 6:25 AM, Marcus Wagner wrote: >> NumNodes=1 NumCPUs=48 NumT

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Marcus Wagner
I have filed a bug: https://bugs.schedmd.com/show_bug.cgi?id=6522 Lets see, what ScheMD has to tell us ;) Best Marcus On 2/15/19 6:25 AM, Marcus Wagner wrote: NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*    TRES=cpu=48,mem=182400M,node=1,billing=48 -- Marcus Wagner, Di

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Marcus Wagner
Hi Chris, that can't be right, or there is some bug elsewhere: We have configured CR_ONE_TASK_PER_CORE, so two tasks won't get a core and its hyperthread. According to your  theory, I configured 48 threads. But then using just --ntasks=48 would give me two nodes, right? But Slurm schedules t

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Christopher Samuel
On 2/14/19 12:22 AM, Marcus Wagner wrote: CPUs=96 Boards=1 SocketsPerBoard=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=191905 That's different to what you put in your config in the original email though. There you had: CPUs=48 Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 This config

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Marcus Wagner
Hi Andreas, might be that this is one of the bugs in Slurm 18. I think, I will open a bug report and see what they say. Thank you very much, nonetheless. Best Marcus On 2/14/19 2:36 PM, Andreas Henkel wrote: Hi Marcus, for us slurmd -C as well as numactl -H looked fine, too. But we're

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Andreas Henkel
Hi Marcus, for us slurmd -C as well as numactl -H looked fine, too. But we're using task/cgroup only and every job starting on a skylake node gave us |error("task/cgroup: task[%u] infinite loop broken while trying " "to provision compute elements using %s (bitmap:%s)", | from src/plugins/task/cg

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Marcus Wagner
Hi Andreas, as slurmd -C shows, it detects 4 numa-nodes taking these as sockets. This was also the way, we configured slurm. numactl -H clearly shows the four domains and which belongs to which socket: node distances: node   0   1   2   3   0:  10  11  21  21   1:  11  10  21  21   2:  21  2

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Henkel, Andreas
Hi Marcus, We have skylake too and it didn’t work for us. We used cgroups only and process binding went completely havoc with subnuma enabled. While searching for solutions I found that hwloc does support subnuma only with version > 2 (when looking for skylake in hwloc you will get hits in versi

Re: [slurm-users] Strange error, submission denied

2019-02-14 Thread Marcus Wagner
Hi Andreas, On 2/14/19 8:56 AM, Henkel, Andreas wrote: Hi Marcus, More ideas: CPUs doesn’t always count as core but may take the meaning of one thread, hence makes different Maybe the behavior of CR_ONE_TASK is still not solid nor properly documente and ntasks and ntasks-per-node are honor

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Henkel, Andreas
Hi Marcus, More ideas: CPUs doesn’t always count as core but may take the meaning of one thread, hence makes different Maybe the behavior of CR_ONE_TASK is still not solid nor properly documente and ntasks and ntasks-per-node are honored different internally. If so solely using ntasks can mea

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi Chris, this are 96 thread nodes with 48 cores. You are right, that if we set it to 24, the job will get scheduled. But then, only half of the node is used. On the other side, if I only use --ntasks=48, slurm schedules all tasks onto the same node. The hyperthread of each core is included i

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi Andreas, I get the same result if I set --ntasks-per-node=48 and --ntasks=48, or 96, or whatever. What we wanted to achieve is, that exactly ntasks-per-node tasks get scheduled onto one host. Best Marcus On 2/14/19 7:09 AM, Henkel, Andreas wrote: Hi Marcus, What just came to my mind:

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Chris Samuel
On Wednesday, 13 February 2019 4:48:05 AM PST Marcus Wagner wrote: > #SBATCH --ntasks-per-node=48 I wouldn't mind betting is that if you set that to 24 it will work, and each thread will be assigned a single core with the 2 thread units on it. All the best, Chris -- Chris Samuel : http://w

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Henkel, Andreas
Hi Marcus, What just came to my mind: if you don’t set —ntasks isn’t the default just 1? All examples I know using ntasks-per-node also set ntasks with ntasks >= ntasks-per-node. Best, Andreas > Am 14.02.2019 um 06:33 schrieb Marcus Wagner : > > Hi all, > > I have narrowed this down a litt

Re: [slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi all, I have narrowed this down a little bit. the really astonishing thing is, that if I use --ntasks=48 I can submit the job, it will be scheduled onto one host:    NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*    TRES=cpu=48,mem=182400M,node=1,billing=48 but as soon as

[slurm-users] Strange error, submission denied

2019-02-13 Thread Marcus Wagner
Hi all, I have a strange behaviour here. We are using slurm 18.08.5-2 on CentOS 7.6. Let me first describe our computenodes: NodeName=ncm[0001-1032]  CPUs=48  Sockets=4 CoresPerSocket=12 ThreadsPerCore=2 RealMemory=185000 Feature=skx8160,hostok,hpcwork    Weight=10541 Stat