Hi all,

I have small MPI test program just printing the rannk id of a parallel job.
The output is like this:
>mpirun -n 2 ./mpitest
Hello world: rank 0 of 2 running on cddlogin
Hello world: rank 1 of 2 running on cddlogin

I ran this test program with salloc. It produces similar output:
>salloc -n 2
salloc: Granted job allocation 3605
>mpirun -n 2 ./mpitest
Hello world: rank 0 of 2 running on cdd001
Hello world: rank 1 of 2 running on cdd001

I put this one line command into a bash script for running with sbatch. It
also get the same result as expected. However, it is totally different if
it run with srun:
>srun -n 2 mpirun -n 2 ./mpitest
Hello world: rank 0 of 2 running on cdd001
Hello world: rank 1 of 2 running on cdd001
Hello world: rank 0 of 2 running on cdd001
Hello world: rank 1 of 2 running on cdd001

The test program was invoked twice ($SLURM_NTASKS) with each time asked 2
($SLURM_NTASKS) CPU for mpi program!!

The problem of srun is actually not about mpi:
>srun -n 2 echo "Hello"
Hello
Hello

How can I resolve the problem of srun, and let it behaves like sbatch or
salloc, where the program executed only one time?

The version of slurm is 16.05.3, and

Any help is highly appreciated!

Regards,

Junjun

Reply via email to