Dear slurm-users,
I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm 20.02.5): # COMPUTE NODES GresTypes=gpu NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64 RealMemory=192073 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN PartitionName=admin Nodes=lsm[216-217] Default=YES MaxTime=INFINITE State=UP The slurmctl is running on a separate Ubuntu system where no slurmd is installed. If a user executes this script (sbatch srun2.bash) #!/bin/bash #SBATCH -N 2 -n9 srun pimpleFoam -case /mnt/NFS/users/quast/channel395-10 -parallel > /dev/null & srun pimpleFoam -case /mnt/NFS/users/quast/channel395-11 -parallel > /dev/null & srun pimpleFoam -case /mnt/NFS/users/quast/channel395-12 -parallel > /dev/null & srun pimpleFoam -case /mnt/NFS/users/quast/channel395-13 -parallel > /dev/null & srun pimpleFoam -case /mnt/NFS/users/quast/channel395-14 -parallel > /dev/null & srun pimpleFoam -case /mnt/NFS/users/quast/channel395-15 -parallel > /dev/null & srun pimpleFoam -case /mnt/NFS/users/quast/channel395-16 -parallel > /dev/null & srun pimpleFoam -case /mnt/NFS/users/quast/channel395-17 -parallel > /dev/null & wait 8 jobs with 9 threads are launched and distributed on two nodes. If more such scripts get started at the same time, all the srun commands will be executed even though no free cores are available. So the nodes are overallocated. How can this be prevented? Thx :) Greetings max
smime.p7s
Description: S/MIME cryptographic signature