Hi,
Please review the settings in slurm.conf for oversubscribe for cpu cores
and setting jobs to use oversubscribe in sbatch. I don't know if it is
still true, but delete the boards=1 from node definition. It used to mess
up the math.
Doug

On Wed, Mar 27, 2024, 7:09 AM Guillaume COCHARD via slurm-users <
slurm-users@lists.schedmd.com> wrote:

> Hello,
>
> We have upgraded our cluster to Slurm 23.11.1 then, a few weeks later, to
> 23.11.4. Since then, Slurm doesn't detect hyperthreaded CPUs. We have
> downgraded our test cluster, the issue is not present with Slurm 22.05 (we
> had skipped Slurm 23.02).
>
> For example, we are working with this node:
>
> $ slurmd -C
> NodeName=node03 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10
> ThreadsPerCore=2 RealMemory=128215
>
> It is defined like this in slurm.conf:
>
> SelectTypeParameters=CR_CPU_Memory
> TaskPlugin=task/cgroup,task/affinity
> NodeName=node03 CPUs=40 RealMemory=150000 Feature=htc MemSpecLimit=5000
> NodeSet=htc Feature=htc
> PartitionName=htc Default=YES MinNodes=0 MaxNodes=1 Nodes=htc
> DefMemPerCPU=1000 State=UP LLN=Yes MaxMemPerNode=142000
>
> So no oversubscribing, 20 cores and 40 CPUs thanks to hyperthreading.
> Until the updgrade, Slurm was allocating those 40 CPUs: when launching 40
> jobs of 1 CPU, each of those job would use one 1 CPU. This is the expected
> behavior.
>
> Since the upgrade, we can still launch those 40 jobs, but only the first
> half of the CPUs will be used (CPUs 0 to 19 according to htop). Each of
> those CPUs is used by 2 jobs, and the second half of the CPUs (#20 to 39)
> stay completely idle. When launching 40 stress processes directly in the
> node without using Slurm all the CPUs are used.
>
> When allocating a specific CPU with srun, it works until CPU #19 and then
> an error occurs even if the allocation includes all the CPUs of the node:
>
> #SBATCH --ntasks=1
> #SBATCH --cpus-per-task=40
> # Works for 0 to 19
> srun --cpu-bind=v,map_cpu:19 stress.py
>
> # Doesn't work (20 to 39)
> srun --cpu-bind=v,map_cpu:20 stress.py
> # Output:
> srun: error: CPU binding outside of job step allocation, allocated CPUs
> are: 0x00000FFFFF.
> srun: error: Task launch for StepId=57194.0 failed on node node03: Unable
> to satisfy cpu bind request
> srun: error: Application launch failed: Unable to satisfy cpu bind request
> srun: Job step aborted
>
> This behaviour concerns all our nodes, some of which have been restarted
> recently and others have not. This causes the jobs to be frequently
> interrupted, augmenting the difference between the system real time and
> user+system times and making the jobs slower. We have been peering the
> documentation but, from what we understand, our configuration seems
> correct. In particular, as advised by the documentation[1], we don't set up
> ThreadsPerCore in slurm.conf.
>
> Are we missing something, or is there a regression or a change in Slurm
> configuration since the version 23.11?
>
> Thank you,
> Guillaume
>
> [1] : https://slurm.schedmd.com/slurm.conf.html#OPT_ThreadsPerCore
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
>
-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

Reply via email to