Hi, Please review the settings in slurm.conf for oversubscribe for cpu cores and setting jobs to use oversubscribe in sbatch. I don't know if it is still true, but delete the boards=1 from node definition. It used to mess up the math. Doug
On Wed, Mar 27, 2024, 7:09 AM Guillaume COCHARD via slurm-users < slurm-users@lists.schedmd.com> wrote: > Hello, > > We have upgraded our cluster to Slurm 23.11.1 then, a few weeks later, to > 23.11.4. Since then, Slurm doesn't detect hyperthreaded CPUs. We have > downgraded our test cluster, the issue is not present with Slurm 22.05 (we > had skipped Slurm 23.02). > > For example, we are working with this node: > > $ slurmd -C > NodeName=node03 CPUs=40 Boards=1 SocketsPerBoard=2 CoresPerSocket=10 > ThreadsPerCore=2 RealMemory=128215 > > It is defined like this in slurm.conf: > > SelectTypeParameters=CR_CPU_Memory > TaskPlugin=task/cgroup,task/affinity > NodeName=node03 CPUs=40 RealMemory=150000 Feature=htc MemSpecLimit=5000 > NodeSet=htc Feature=htc > PartitionName=htc Default=YES MinNodes=0 MaxNodes=1 Nodes=htc > DefMemPerCPU=1000 State=UP LLN=Yes MaxMemPerNode=142000 > > So no oversubscribing, 20 cores and 40 CPUs thanks to hyperthreading. > Until the updgrade, Slurm was allocating those 40 CPUs: when launching 40 > jobs of 1 CPU, each of those job would use one 1 CPU. This is the expected > behavior. > > Since the upgrade, we can still launch those 40 jobs, but only the first > half of the CPUs will be used (CPUs 0 to 19 according to htop). Each of > those CPUs is used by 2 jobs, and the second half of the CPUs (#20 to 39) > stay completely idle. When launching 40 stress processes directly in the > node without using Slurm all the CPUs are used. > > When allocating a specific CPU with srun, it works until CPU #19 and then > an error occurs even if the allocation includes all the CPUs of the node: > > #SBATCH --ntasks=1 > #SBATCH --cpus-per-task=40 > # Works for 0 to 19 > srun --cpu-bind=v,map_cpu:19 stress.py > > # Doesn't work (20 to 39) > srun --cpu-bind=v,map_cpu:20 stress.py > # Output: > srun: error: CPU binding outside of job step allocation, allocated CPUs > are: 0x00000FFFFF. > srun: error: Task launch for StepId=57194.0 failed on node node03: Unable > to satisfy cpu bind request > srun: error: Application launch failed: Unable to satisfy cpu bind request > srun: Job step aborted > > This behaviour concerns all our nodes, some of which have been restarted > recently and others have not. This causes the jobs to be frequently > interrupted, augmenting the difference between the system real time and > user+system times and making the jobs slower. We have been peering the > documentation but, from what we understand, our configuration seems > correct. In particular, as advised by the documentation[1], we don't set up > ThreadsPerCore in slurm.conf. > > Are we missing something, or is there a regression or a change in Slurm > configuration since the version 23.11? > > Thank you, > Guillaume > > [1] : https://slurm.schedmd.com/slurm.conf.html#OPT_ThreadsPerCore > > -- > slurm-users mailing list -- slurm-users@lists.schedmd.com > To unsubscribe send an email to slurm-users-le...@lists.schedmd.com >
-- slurm-users mailing list -- slurm-users@lists.schedmd.com To unsubscribe send an email to slurm-users-le...@lists.schedmd.com