Hi all, I have small MPI test program just printing the rannk id of a parallel job. The output is like this: >mpirun -n 2 ./mpitest Hello world: rank 0 of 2 running on cddlogin Hello world: rank 1 of 2 running on cddlogin
I ran this test program with salloc. It produces similar output: >salloc -n 2 salloc: Granted job allocation 3605 >mpirun -n 2 ./mpitest Hello world: rank 0 of 2 running on cdd001 Hello world: rank 1 of 2 running on cdd001 I put this one line command into a bash script for running with sbatch. It also get the same result as expected. However, it is totally different if it run with srun: >srun -n 2 mpirun -n 2 ./mpitest Hello world: rank 0 of 2 running on cdd001 Hello world: rank 1 of 2 running on cdd001 Hello world: rank 0 of 2 running on cdd001 Hello world: rank 1 of 2 running on cdd001 The test program was invoked twice ($SLURM_NTASKS) with each time asked 2 ($SLURM_NTASKS) CPU for mpi program!! The problem of srun is actually not about mpi: >srun -n 2 echo "Hello" Hello Hello How can I resolve the problem of srun, and let it behaves like sbatch or salloc, where the program executed only one time? The version of slurm is 16.05.3, and Any help is highly appreciated! Regards, Junjun