[slurm-users] sbatch overallocation

Max Quast Sat, 10 Oct 2020 04:09:32 -0700

Dear slurm-users,


I built a slurm system consisting of two nodes (Ubuntu 20.04.1, slurm
20.02.5):

 

                # COMPUTE NODES

                GresTypes=gpu

                NodeName=lsm[216-217] Gres=gpu:tesla:1 CPUs=64
RealMemory=192073 Sockets=2 CoresPerSocket=16 ThreadsPerCore=2 State=UNKNOWN

                PartitionName=admin Nodes=lsm[216-217] Default=YES
MaxTime=INFINITE State=UP

 

The slurmctl is running on a separate Ubuntu system where no slurmd is
installed.

 

If a user executes this script (sbatch srun2.bash)

 

                #!/bin/bash

                #SBATCH -N 2 -n9

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-10
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-11
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-12
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-13
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-14
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-15
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-16
-parallel > /dev/null &

                srun pimpleFoam -case /mnt/NFS/users/quast/channel395-17
-parallel > /dev/null &

                wait

 

8 jobs with 9 threads are launched and distributed on two nodes.

 

If more such scripts get started at the same time, all the srun commands
will be executed even though no free cores are available. So the nodes are
overallocated.

How can this be prevented?

 

Thx :)

 

Greetings 

max

smime.p7s
Description: S/MIME cryptographic signature

[slurm-users] sbatch overallocation

Reply via email to