I think this is what Paddy was getting at above: the `-n` argument to `srun` will run `-n` copies of the program indicated on the command line. This is how `srun` was designed and isn't a bug or problem.
I guess the question is what you're trying to accomplish. From appearances, you want to run an MPI program on the command line interactively. For this I'd say `salloc` as you've done it is the way to go. As rhc indicates above, if you have MPI built with PMI you don't need to tell `mpirun` what to do, it will detect the number and location of assigned nodes: $ salloc -n 4 -N 4 salloc: Pending job allocation 46557547 salloc: job 46557547 queued and waiting for resources salloc: job 46557547 has been allocated resources salloc: Granted job allocation 46557547 $ srun hostname node114 node311 node310 node312 $ mpirun bin/helloMPI Hello world from processor node114, rank 0 out of 4 processors Hello world from processor node312, rank 3 out of 4 processors Hello world from processor node310, rank 1 out of 4 processors Hello world from processor node311, rank 2 out of 4 processors HTH On Mon, Jan 23, 2017 at 1:36 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: > > Note that 16.05 contains support for PMIx, so if you are using OMPI 2.0 or > above, you should ensure that the slurm PMIx support is configured “on” and > use that for srun (I believe you have to tell srun the pmi version to use, > so perhaps “srun -mpi=pmix”?) > > > > On Jan 23, 2017, at 7:10 AM, TO_Webmaster <luftha...@gmail.com> wrote: > > > > > > Is this OpenMPI? We experienced similar behaviour with OpenMPI. This > > was fixed after recompiling OpenMPI with PMI, i.e. > > > > ./configure [...] --with-pmi=/path/to/slurm [...] > > > > 2017-01-23 14:22 GMT+01:00 liu junjun <ljjl...@gmail.com>: > >> Hi Paddy, > >> > >> Thanks a lot for you kind helps! > >> > >> Replacing mpirun by srun seems still not working. Here's how I did: > >>> cat a.sh > >> #!/bin/bash > >> srun ./mpitest > >>> sbatch -n 2 ./a.sh > >> Submitted batch job 3611 > >>> cat slurm-3611.out > >> Hello world: rank 0 of 1 running on cdd001 > >> Hello world: rank 0 of 1 running on cdd001 > >> > >> So, srun just executed the program twice, instead of running it as > parallel. > >> > >> If I replace the srun back to mpirun, the good output is produced: > >>> cat a.sh > >> #!/bin/bash > >> mpirun ./mpitest > >>> sbatch a.sh > >> Submitted batch job 3612 > >>> cat slurm-3612.out > >> Hello world: rank 0 of 2 running on cdd001 > >> Hello world: rank 1 of 2 running on cdd001 > >> > >> I also tried using srun inside bash script for serial program: > >>> cat a.sh > >> #!/bin/bash > >> srun echo Hello > >>> sbatch -n 2 ./a.sh > >> Submitted batch job 3614 > >>> cat slurm-3614.out > >> Hello > >> Hello > >> > >> Any idea? > >> > >> Thanks in advance! > >> > >> Junjun > >> > >> > >> > >> On Mon, Jan 23, 2017 at 6:16 PM, Paddy Doyle <pa...@tchpc.tcd.ie> > wrote: > >>> > >>> > >>> Hi Junjun, > >>> > >>> On Mon, Jan 23, 2017 at 12:04:17AM -0800, liu junjun wrote: > >>> > >>>> Hi all, > >>>> > >>>> I have small MPI test program just printing the rannk id of a parallel > >>>> job. > >>>> The output is like this: > >>>>> mpirun -n 2 ./mpitest > >>>> Hello world: rank 0 of 2 running on cddlogin > >>>> Hello world: rank 1 of 2 running on cddlogin > >>>> > >>>> I ran this test program with salloc. It produces similar output: > >>>>> salloc -n 2 > >>>> salloc: Granted job allocation 3605 > >>>>> mpirun -n 2 ./mpitest > >>>> Hello world: rank 0 of 2 running on cdd001 > >>>> Hello world: rank 1 of 2 running on cdd001 > >>>> > >>>> I put this one line command into a bash script for running with > sbatch. > >>>> It > >>>> also get the same result as expected. However, it is totally different > >>>> if > >>>> it run with srun: > >>>>> srun -n 2 mpirun -n 2 ./mpitest > >>>> Hello world: rank 0 of 2 running on cdd001 > >>>> Hello world: rank 1 of 2 running on cdd001 > >>>> Hello world: rank 0 of 2 running on cdd001 > >>>> Hello world: rank 1 of 2 running on cdd001 > >>> > >>> That looks like expected behaviour from calling both srun and mpirun; > have > >>> never > >>> tried it, but it looks like what might happen if you call them both. > >>> > >>> But it's not recommended to run your code like that. > >>> > >>> I think basically don't call both srun and mpirun! In your sbatch > either > >>> put: > >>> > >>> #SBATCH -n 2 > >>> .... > >>> mpirun ./mpitest > >>> > >>> > >>> ..or: > >>> > >>> > >>> #SBATCH -n 2 > >>> .... > >>> srun ./mpitest > >>> > >>> > >>> You don't need both. And it's simpler not to repeat the '-n 2' again in > >>> the > >>> mpirun/srun line, as it will lead to copy/paste errors when you change > it > >>> in the > >>> '#SBATCH' line but not below. > >>> > >>>> The test program was invoked twice ($SLURM_NTASKS) with each time > asked > >>>> 2 > >>>> ($SLURM_NTASKS) CPU for mpi program!! > >>> > >>> Yes. > >>> > >>>> The problem of srun is actually not about mpi: > >>>>> srun -n 2 echo "Hello" > >>>> Hello > >>>> Hello > >>>> > >>>> How can I resolve the problem of srun, and let it behaves like sbatch > or > >>>> salloc, where the program executed only one time? > >>>> > >>>> The version of slurm is 16.05.3, and > >>> > >>> Thanks, > >>> Paddy > >>> > >>> -- > >>> Paddy Doyle > >>> Trinity Centre for High Performance Computing, > >>> Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. > >>> Phone: +353-1-896-3725 > >>> http://www.tchpc.tcd.ie/ > >> > >> >