On 06/15/2022 03:53 PM, Frank Lenaerts wrote:
> On Wed, Jun 15, 2022 at 02:20:56PM +0200, Guillaume De Nayer wrote:
>> One collegue has to run 20,000 jobs on this machine. Every job starts
>> his program with mpirun on 12 cores. The standard slurm behavior makes
>> that the node, which runs this job is blocked (and 28 cores are idle).
>> The small cluster has only 8 nodes, so only 8 jobs can run in parallel.
> 
> If your colleague also uses sbatch(1)'s --exclusive option, only one
> job can run on a node...
> 

Perhaps I missunderstand the Slurm documentation...

As thought that the --exclusive option used in combination with sbatch
will reserve the whole node (40 cores) for the job (submitted with
sbatch). This part is working fine. I can check it with sacct.

Then, this job starts subtasks on the reserved 40 cores with srun.
Therefore I'm using "-n1 -c1" in combination with "srun". I thought that
it was possible to use the reserved cores inside this job using srun.


The following slightly modified job without --exclusive and with
--ntasks=2 leads to a similar problem: Only one srun is running at a
time. The second starts directly after the first one finished.

#!/bin/bash
#SBATCH --job-name=test_multi_prog_srun
#SBATCH --ntasks=2
#SBATCH --partition=short
#SBATCH --time=02:00:00

srun -vvv --exact -n1 -c1 sleep 20 > srun1.log 2>&1 &
srun -vvv --exact -n1 -c1 sleep 30 > srun2.log 2>&1 &
wait


Kind regards
Guillaume


Reply via email to