Hi all,

I am getting some unexpected behavior with SLURM on a multithreaded CPU (AMD Ryzen 7950X), in combination with a job that uses multiple jobsteps and a program that prefers to run without hyperthreading.

My job consists of a simple shell script that does multiple srun executions, and normally (on non-multithreaded nodes) the srun commands will only start when resources are available inside my allocation. Example:

sbatch -N 1 -n 16 mytestjob.sh

mytestjob.sh contains:
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
srun -n 8 someMPIprog &
wait

srun 1 and 2 will start immediately, srun 3 will start as soon as one of the first two jobsteps is finished, and srun 4 will again wait until some cores are available.

Now I would like this same behavior (no multithreading, one task per core) on a node with 16 multithreaded cores (32 cpus in SLURM ThreadsPerCore=2), so I submit with the following command:
sbatch -N 1 --hint=nomultithread -n 16 mytestjob.sh

Slurm correctly reserves the whole node for this, and srun without additional directions would launch someMPIprog with 16 MPI ranks. Unfortunately in the multi jobstep situation, this causes all four srun iterations to start immediately, resulting in 4x 8 MPI ranks running at the same time, and thus multithreading. As I specified --hint=nomultithread, I would have expected the same behaviour as on the no-multithreaded node: srun 1 and 2 launch directly, and srun 3 and 4 wait for CPU resources to become available.


So far I've been able to find two hacky ways of getting around this problem:
- do not use --hint=nomultithread, and instead limit using memory (--mem-per-cpu=4000). This is a bit ugly: it reserves half the compute node and seems to bind to the wrong CPU cores. - set --cpus-per-task=2 instead of --hint=nomultithread, but this causes OpenMP to kick in if the MPI program supports it.


To me this feels like a bit of a bug in SLURM: I tell it not to multithread, but it still schedules jobsteps that cause the CPU to multithread

Is there another way of getting the non-multithreaded behavior without disabling multithreading in BIOS?

Best regards and many thanks in advance!
Hans van Schoot


Some additional information:
- I'm running slurm 18.08.4
- This is my node configuration in scontrol:
scontrol show nodes compute-7-0
NodeName=compute-7-0 Arch=x86_64 CoresPerSocket=16
   CPUAlloc=0 CPUTot=32 CPULoad=1.00
   AvailableFeatures=rack-7,32CPUs
   ActiveFeatures=rack-7,32CPUs
   Gres=(null)
   NodeAddr=10.1.1.210 NodeHostName=compute-7-0 Version=18.08
   OS=Linux 6.1.8-1.el7.elrepo.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jan 23 12:57:27 EST 2023
   RealMemory=64051 AllocMem=0 FreeMem=62681 Sockets=1 Boards=1
   State=IDLE ThreadsPerCore=2 TmpDisk=0 Weight=20511200 Owner=N/A MCS_label=N/A
   Partitions=zen4
   BootTime=2023-04-04T14:14:55 SlurmdStartTime=2023-04-17T12:32:38
   CfgTRES=cpu=32,mem=64051M,billing=47
   AllocTRES=
   CapWatts=n/a
   CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
   ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s

Reply via email to