Hi Marcus,

We have skylake too and it didn’t work for us. We used cgroups only and process 
binding went completely havoc with subnuma enabled.
While searching for solutions I found that hwloc does support subnuma only with 
version > 2 (when looking for skylake in hwloc you will get hits in version 2 
branches only). At least hwloc 2.x made Numa-blocks children objects whereas 
hwloc 1.x has Numablocks as parents only. I think that was the reason why there 
was a special branch in hwloc for handling subNuma-layouts of Xeon Phi.
But I’ll be happy if you proof me wrong.

Best,
Andreas

> Am 14.02.2019 um 09:32 schrieb Marcus Wagner <wag...@itc.rwth-aachen.de>:
> 
> Hi Andreas,
> 
> 
> 
>> On 2/14/19 8:56 AM, Henkel, Andreas wrote:
>> Hi Marcus,
>> 
>> More ideas:
>> CPUs doesn’t always count as core but may take the meaning of one thread, 
>> hence makes different
>> Maybe the behavior of CR_ONE_TASK  is still not solid nor properly documente 
>> and ntasks and ntasks-per-node are honored different internally. If so 
>> solely using ntasks can mean using alle threads for Slurm even if the 
>> binding may be correct according to binding.
>> Obviously in your results Slurm handles the options differently.
>> 
>> Have you tried configuring the node with cpus=96? What output do you get 
>> from slurmd -C?
> Not yet, as this is not the desired behaviour. We want to schedule by cores. 
> But I will try that. slurmd -C output is the following:
> 
> NodeName=ncm0708 slurmd: Considering each NUMA node as a socket
> CPUs=96 Boards=1 SocketsPerBoard=4 CoresPerSocket=12 ThreadsPerCore=2 
> RealMemory=191905
> UpTime=6-21:30:02
> 
>> Is this a new architecture like skylake? In case of subnuma-Layouts Slurm 
>> can not handle it without hwloc2.
> Yes, we have Skylake and as you can see in the above output, we have 
> subnuma-clustering enabled. Still, we only use hwloc coming with CentOS 7: 
> hwloc-1.11.8-4.el7.x86_64
> Where did you get the information, that hwloc2 is needed?
>> Have you tried to use srun -v(vv) instead of sbatch? Maybe you can get a 
>> glimpse of what Slurm actually does with your options.
> The only strange thing I can observe is the following:
> srun: threads        : 60
> 
> What threads is srun talking about there?
> Nonetheless, here the full output:
> 
> $> srun --ntasks=48 --ntasks-per-node=48 -vvv hostname
> srun: defined options for program `srun'
> srun: --------------- ---------------------
> srun: user           : `mw445520'
> srun: uid            : 40574
> srun: gid            : 40574
> srun: cwd            : /rwthfs/rz/cluster/home/mw445520/tests/slurm/cgroup
> srun: ntasks         : 48 (set)
> srun: nodes          : 1 (default)
> srun: jobid          : 4294967294 (default)
> srun: partition      : default
> srun: profile        : `NotSet'
> srun: job name       : `hostname'
> srun: reservation    : `(null)'
> srun: burst_buffer   : `(null)'
> srun: wckey          : `(null)'
> srun: cpu_freq_min   : 4294967294
> srun: cpu_freq_max   : 4294967294
> srun: cpu_freq_gov   : 4294967294
> srun: switches       : -1
> srun: wait-for-switches : -1
> srun: distribution   : unknown
> srun: cpu-bind       : default (0)
> srun: mem-bind       : default (0)
> srun: verbose        : 3
> srun: slurmd_debug   : 0
> srun: immediate      : false
> srun: label output   : false
> srun: unbuffered IO  : false
> srun: overcommit     : false
> srun: threads        : 60
> srun: checkpoint_dir : /w0/slurm/checkpoint
> srun: wait           : 0
> srun: nice           : -2
> srun: account        : (null)
> srun: comment        : (null)
> srun: dependency     : (null)
> srun: exclusive      : false
> srun: bcast          : false
> srun: qos            : (null)
> srun: constraints    :
> srun: reboot         : yes
> srun: preserve_env   : false
> srun: network        : (null)
> srun: propagate      : NONE
> srun: prolog         : (null)
> srun: epilog         : (null)
> srun: mail_type      : NONE
> srun: mail_user      : (null)
> srun: task_prolog    : (null)
> srun: task_epilog    : (null)
> srun: multi_prog     : no
> srun: sockets-per-node  : -2
> srun: cores-per-socket  : -2
> srun: threads-per-core  : -2
> srun: ntasks-per-node   : 48
> srun: ntasks-per-socket : -2
> srun: ntasks-per-core   : -2
> srun: plane_size        : 4294967294
> srun: core-spec         : NA
> srun: power             :
> srun: cpus-per-gpu      : 0
> srun: gpus              : (null)
> srun: gpu-bind          : (null)
> srun: gpu-freq          : (null)
> srun: gpus-per-node     : (null)
> srun: gpus-per-socket   : (null)
> srun: gpus-per-task     : (null)
> srun: mem-per-gpu       : 0
> srun: remote command    : `hostname'
> srun: debug:  propagating SLURM_PRIO_PROCESS=0
> srun: debug:  propagating UMASK=0007
> srun: debug2: srun PMI messages to port=34521
> srun: debug:  Entering slurm_allocation_msg_thr_create()
> srun: debug:  port from net_stream_listen is 35465
> srun: debug:  Entering _msg_thr_internal
> srun: debug:  Munge authentication plugin loaded
> srun: error: CPU count per node can not be satisfied
> srun: error: Unable to allocate resources: Requested node configuration is 
> not available
> 
> 
> 
> Best
> Marcus
> 
> 
>> 
>> Best,
>> Andreas
>> 
>> 
>>> Am 14.02.2019 um 08:34 schrieb Marcus Wagner <wag...@itc.rwth-aachen.de>:
>>> 
>>> Hi Chris,
>>> 
>>> 
>>> this are 96 thread nodes with 48 cores. You are right, that if we set it to 
>>> 24, the job will get scheduled. But then, only half of the node is used. On 
>>> the other side, if I only use --ntasks=48, slurm schedules all tasks onto 
>>> the same node. The hyperthread of each core is included in the cgroup and 
>>> the task_affinity plugin also correctly binds the hyperthread together with 
>>> the core (small ugly testscript from us, the last two numbers are the core 
>>> and its hyperthread):
>>> 
>>> ncm0728.hpc.itc.rwth-aachen.de <0> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 0,48
>>> ncm0728.hpc.itc.rwth-aachen.de <10> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 26,74
>>> ncm0728.hpc.itc.rwth-aachen.de <11> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 29,77
>>> ncm0728.hpc.itc.rwth-aachen.de <12> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 6,54
>>> ncm0728.hpc.itc.rwth-aachen.de <13> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 9,57
>>> ncm0728.hpc.itc.rwth-aachen.de <14> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 30,78
>>> ncm0728.hpc.itc.rwth-aachen.de <15> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 33,81
>>> ncm0728.hpc.itc.rwth-aachen.de <16> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 7,55
>>> ncm0728.hpc.itc.rwth-aachen.de <17> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 10,58
>>> ncm0728.hpc.itc.rwth-aachen.de <18> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 31,79
>>> ncm0728.hpc.itc.rwth-aachen.de <19> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 34,82
>>> ncm0728.hpc.itc.rwth-aachen.de <1> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 3,51
>>> ncm0728.hpc.itc.rwth-aachen.de <20> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 8,56
>>> ncm0728.hpc.itc.rwth-aachen.de <21> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 11,59
>>> ncm0728.hpc.itc.rwth-aachen.de <22> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 32,80
>>> ncm0728.hpc.itc.rwth-aachen.de <23> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 35,83
>>> ncm0728.hpc.itc.rwth-aachen.de <24> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 12,60
>>> ncm0728.hpc.itc.rwth-aachen.de <25> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 15,63
>>> ncm0728.hpc.itc.rwth-aachen.de <26> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 36,84
>>> ncm0728.hpc.itc.rwth-aachen.de <27> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 39,87
>>> ncm0728.hpc.itc.rwth-aachen.de <28> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 13,61
>>> ncm0728.hpc.itc.rwth-aachen.de <29> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 16,64
>>> ncm0728.hpc.itc.rwth-aachen.de <2> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 24,72
>>> ncm0728.hpc.itc.rwth-aachen.de <30> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 37,85
>>> ncm0728.hpc.itc.rwth-aachen.de <31> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 40,88
>>> ncm0728.hpc.itc.rwth-aachen.de <32> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 14,62
>>> ncm0728.hpc.itc.rwth-aachen.de <33> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 17,65
>>> ncm0728.hpc.itc.rwth-aachen.de <34> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 38,86
>>> ncm0728.hpc.itc.rwth-aachen.de <35> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 41,89
>>> ncm0728.hpc.itc.rwth-aachen.de <36> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 18,66
>>> ncm0728.hpc.itc.rwth-aachen.de <37> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 21,69
>>> ncm0728.hpc.itc.rwth-aachen.de <38> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 42,90
>>> ncm0728.hpc.itc.rwth-aachen.de <39> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 45,93
>>> ncm0728.hpc.itc.rwth-aachen.de <3> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 27,75
>>> ncm0728.hpc.itc.rwth-aachen.de <40> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 19,67
>>> ncm0728.hpc.itc.rwth-aachen.de <41> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 22,70
>>> ncm0728.hpc.itc.rwth-aachen.de <42> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 43,91
>>> ncm0728.hpc.itc.rwth-aachen.de <43> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 46,94
>>> ncm0728.hpc.itc.rwth-aachen.de <44> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 20,68
>>> ncm0728.hpc.itc.rwth-aachen.de <45> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 23,71
>>> ncm0728.hpc.itc.rwth-aachen.de <46> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 44,92
>>> ncm0728.hpc.itc.rwth-aachen.de <47> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 47,95
>>> ncm0728.hpc.itc.rwth-aachen.de <4> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 1,49
>>> ncm0728.hpc.itc.rwth-aachen.de <5> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 4,52
>>> ncm0728.hpc.itc.rwth-aachen.de <6> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 25,73
>>> ncm0728.hpc.itc.rwth-aachen.de <7> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 28,76
>>> ncm0728.hpc.itc.rwth-aachen.de <8> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 2,50
>>> ncm0728.hpc.itc.rwth-aachen.de <9> OMP_STACKSIZE: <#> unlimited+p2 +pemap 
>>> 5,53
>>> 
>>> 
>>> --ntasks=48:
>>> 
>>>    NodeList=ncm0728
>>>    BatchHost=ncm0728
>>>    NumNodes=1 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>>    TRES=cpu=48,mem=182400M,node=1,billing=48
>>> 
>>> 
>>> --ntasks=48
>>> --ntasks-per-node=24:
>>> 
>>>    NodeList=ncm[0438-0439]
>>>    BatchHost=ncm0438
>>>    NumNodes=2 NumCPUs=48 NumTasks=48 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
>>>    TRES=cpu=48,mem=182400M,node=2,billing=48
>>> 
>>> 
>>> --ntasks=48
>>> --ntasks-per-node=48:
>>> 
>>> sbatch: error: CPU count per node can not be satisfied
>>> sbatch: error: Batch job submission failed: Requested node configuration is 
>>> not available
>>> 
>>> 
>>> Isn't the first essentially the same as the last, with the difference, that 
>>> I want to force slurm to put all tasks onto one node?
>>> 
>>> 
>>> 
>>> Best
>>> Marcus
>>> 
>>> 
>>>>> On 2/14/19 7:15 AM, Chris Samuel wrote:
>>>>> On Wednesday, 13 February 2019 4:48:05 AM PST Marcus Wagner wrote:
>>>>> 
>>>>> #SBATCH --ntasks-per-node=48
>>>> I wouldn't mind betting is that if you set that to 24 it will work, and 
>>>> each
>>>> thread will be assigned a single core with the 2 thread units on it.
>>>> 
>>>> All the best,
>>>> Chris
>>> -- 
>>> Marcus Wagner, Dipl.-Inf.
>>> 
>>> IT Center
>>> Abteilung: Systeme und Betrieb
>>> RWTH Aachen University
>>> Seffenter Weg 23
>>> 52074 Aachen
>>> Tel: +49 241 80-24383
>>> Fax: +49 241 80-624383
>>> wag...@itc.rwth-aachen.de
>>> www.itc.rwth-aachen.de
>>> 
>>> 
> 
> -- 
> Marcus Wagner, Dipl.-Inf.
> 
> IT Center
> Abteilung: Systeme und Betrieb
> RWTH Aachen University
> Seffenter Weg 23
> 52074 Aachen
> Tel: +49 241 80-24383
> Fax: +49 241 80-624383
> wag...@itc.rwth-aachen.de
> www.itc.rwth-aachen.de
> 
> 

Reply via email to