Hi Junjun, On Mon, Jan 23, 2017 at 12:04:17AM -0800, liu junjun wrote:
> Hi all, > > I have small MPI test program just printing the rannk id of a parallel job. > The output is like this: > >mpirun -n 2 ./mpitest > Hello world: rank 0 of 2 running on cddlogin > Hello world: rank 1 of 2 running on cddlogin > > I ran this test program with salloc. It produces similar output: > >salloc -n 2 > salloc: Granted job allocation 3605 > >mpirun -n 2 ./mpitest > Hello world: rank 0 of 2 running on cdd001 > Hello world: rank 1 of 2 running on cdd001 > > I put this one line command into a bash script for running with sbatch. It > also get the same result as expected. However, it is totally different if > it run with srun: > >srun -n 2 mpirun -n 2 ./mpitest > Hello world: rank 0 of 2 running on cdd001 > Hello world: rank 1 of 2 running on cdd001 > Hello world: rank 0 of 2 running on cdd001 > Hello world: rank 1 of 2 running on cdd001 That looks like expected behaviour from calling both srun and mpirun; have never tried it, but it looks like what might happen if you call them both. But it's not recommended to run your code like that. I think basically don't call both srun and mpirun! In your sbatch either put: #SBATCH -n 2 .... mpirun ./mpitest ..or: #SBATCH -n 2 .... srun ./mpitest You don't need both. And it's simpler not to repeat the '-n 2' again in the mpirun/srun line, as it will lead to copy/paste errors when you change it in the '#SBATCH' line but not below. > The test program was invoked twice ($SLURM_NTASKS) with each time asked 2 > ($SLURM_NTASKS) CPU for mpi program!! Yes. > The problem of srun is actually not about mpi: > >srun -n 2 echo "Hello" > Hello > Hello > > How can I resolve the problem of srun, and let it behaves like sbatch or > salloc, where the program executed only one time? > > The version of slurm is 16.05.3, and Thanks, Paddy -- Paddy Doyle Trinity Centre for High Performance Computing, Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. Phone: +353-1-896-3725 http://www.tchpc.tcd.ie/