[slurm-dev] srun with multithreading applications and task/cgroup

Trey Dockendorf Tue, 02 Sep 2014 14:20:11 -0700

I'm running small HPL benchmarks for testing, using 2 nodes with 32-cores each. 
 I've compiled HPL against MVAPICH2 (compiled using --with-pm=no 
--with-pmi=slurm) and OpenBLAS.  I've noticed that when I run a job that is 
supposed to have 1 tasks per node and 32 CPUs per task, that only 1 CPU has any 
load.  The other 31 CPUs are idle (as seen by top, mpstat, etc).


My sbatch script:

#!/bin/bash
#SBATCH -J HPL_2x32_openblas_mvapich2
#SBATCH -o logs/HPL_2x32_openblas_mvapich2-%J.out
#SBATCH -p mpi-core32
#SBATCH --time=48:00:00
#SBATCH -N2
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=32
#SBATCH --hint=multithread

export OPENBLAS_NUM_THREADS=32
export PATH=$PATH:$HOME/hpl
srun --cpu_bind=none xhpl_openblas_mvapich2

Below are relevant configs.  I've tried removing "--cpu_bind" as well as 
setting to "threads".  Each node has the correct number of threads running, but 
only 1 of the 32 cores has a load above 0%.  It appears as though something in 
the use of srun is binding that process to a single core.  I saw in the sbatch 
and srun docs mention of multithreaded tasks inheriting CPU binding of parent 
process, but unsure how to bind the parent process to use all CPUs.

Thanks,
- Trey

slurm.conf:

JobAcctGatherType=jobacct_gather/linux
MaxMemPerCPU=1960
MpiDefault=pmi2
MpiParams=ports=30000-39999
PreemptMode=SUSPEND,GANG
PreemptType=preempt/partition_prio
ProctrackType=proctrack/cgroup
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_CPU_Memory,CR_CORE_DEFAULT_DIST_BLOCK
TaskPlugin=task/cgroup
TaskPluginParam=Sched
VSizeFactor=101

NodeName=c0237 NodeAddr=192.168.200.87 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=129000 TmpDisk=16000 
Feature=core32,mem128gb,ib_ddr,bulldozer,interlagos State=UNKNOWN
NodeName=c0238 NodeAddr=192.168.200.88 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=129000 TmpDisk=16000 
Feature=core32,mem128gb,ib_ddr,bulldozer,interlagos State=UNKNOWN
NodeName=c0133 NodeAddr=192.168.200.42 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=129000 TmpDisk=16000 
Feature=core32,mem128gb,ib_ddr,piledriver,abu_dhabi State=UNKNOWN
NodeName=c0134 NodeAddr=192.168.200.43 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=129000 TmpDisk=16000 
Feature=core32,mem128gb,ib_ddr,piledriver,abu_dhabi State=UNKNOWN
NodeName=c0933 NodeAddr=192.168.201.95 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=64300 TmpDisk=16000 
Feature=core32,mem64gb,ib_ddr,piledriver,abu_dhabi State=UNKNOWN
NodeName=c0934 NodeAddr=192.168.201.96 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=64300 TmpDisk=16000 
Feature=core32,mem64gb,ib_ddr,piledriver,abu_dhabi State=UNKNOWN
NodeName=c0935 NodeAddr=192.168.201.97 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=64300 TmpDisk=16000 
Feature=core32,mem64gb,ib_ddr,piledriver,abu_dhabi State=UNKNOWN
NodeName=c0936 NodeAddr=192.168.201.98 CPUs=32 Sockets=4 CoresPerSocket=8 
ThreadsPerCore=1 RealMemory=64300 TmpDisk=16000 
Feature=core32,mem64gb,ib_ddr,piledriver,abu_dhabi State=UNKNOWN

PartitionName=mpi-core32 Nodes=c[0133-0134],c[0237-0238],c[0933-0936] 
Priority=100 AllowQOS=mpi MinNodes=2 MaxTime=48:00:00 State=UP

cgroup.conf:
CgroupMountpoint=/cgroup
CgroupAutomount=yes
CgroupReleaseAgentDir="/home/slurm/cgroup"
ConstrainCores=yes
TaskAffinity=yes
AllowedRAMSpace=100
AllowedSwapSpace=0
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30
ConstrainDevices=no
AllowedDevicesFile=/home/slurm/conf/cgroup_allowed_devices_file.conf



=============================

Trey Dockendorf 
Systems Analyst I 
Texas A&M University 
Academy for Advanced Telecommunications and Learning Technologies 
Phone: (979)458-2396 
Email: [email protected] 
Jabber: [email protected]

[slurm-dev] srun with multithreading applications and task/cgroup

Reply via email to