On 06/15/2022 03:53 PM, Frank Lenaerts wrote: > On Wed, Jun 15, 2022 at 02:20:56PM +0200, Guillaume De Nayer wrote: >> One collegue has to run 20,000 jobs on this machine. Every job starts >> his program with mpirun on 12 cores. The standard slurm behavior makes >> that the node, which runs this job is blocked (and 28 cores are idle). >> The small cluster has only 8 nodes, so only 8 jobs can run in parallel. > > If your colleague also uses sbatch(1)'s --exclusive option, only one > job can run on a node... >
Perhaps I missunderstand the Slurm documentation... As thought that the --exclusive option used in combination with sbatch will reserve the whole node (40 cores) for the job (submitted with sbatch). This part is working fine. I can check it with sacct. Then, this job starts subtasks on the reserved 40 cores with srun. Therefore I'm using "-n1 -c1" in combination with "srun". I thought that it was possible to use the reserved cores inside this job using srun. The following slightly modified job without --exclusive and with --ntasks=2 leads to a similar problem: Only one srun is running at a time. The second starts directly after the first one finished. #!/bin/bash #SBATCH --job-name=test_multi_prog_srun #SBATCH --ntasks=2 #SBATCH --partition=short #SBATCH --time=02:00:00 srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 & srun -vvv --exact -n1 -c1 sleep 30 > srun2.log 2>&1 & wait Kind regards Guillaume