Hi! I am doing like this:
sbatch -N 2 -n 5 ./job.sh where job.sh is: #!/bin/bash -l module load openmpi/2.0.1-icc mpirun -np 1 ./manager 4 On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote: > The cmd line looks fine - when you do your “sbatch” request, what is in > the shell script you give it? Or are you saying you just “sbatch” the > mpirun cmd directly? > > > On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina < > nastja.kruchin...@gmail.com> wrote: > > Hi, > > I am running like this: > mpirun -np 1 ./manager > > Should I do it differently? > > I also thought that all sbatch does is create an allocation and then run > my script in it. But it seems it is not since I am getting these results... > > I would like to upgrade to OpenMPI, but no clusters near me have it yet :( > So I even cannot check if it works with OpenMPI 2.0.2. > > On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com> > wrote: > >> Hi Anastasia, >> >> Definitely check the mpirun when in batch environment but you may also >> want to upgrade to Open MPI 2.0.2. >> >> Howard >> >> r...@open-mpi.org <r...@open-mpi.org> schrieb am Mi. 15. Feb. 2017 um >> 07:49: >> >>> Nothing immediate comes to mind - all sbatch does is create an >>> allocation and then run your script in it. Perhaps your script is using a >>> different “mpirun” command than when you type it interactively? >>> >>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina < >>> nastja.kruchin...@gmail.com> wrote: >>> >>> Hi, >>> >>> I am trying to use MPI_Comm_spawn function in my code. I am having >>> trouble with openmpi 2.0.x + sbatch (batch system Slurm). >>> My test program is located here: http://user.it.uu.se/~anakr367 >>> /files/MPI_test/ >>> >>> When I am running my code I am getting an error: >>> >>> OPAL ERROR: Timeout in file >>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line >>> 193 >>> *** An error occurred in MPI_Init_thread >>> *** on a NULL communicator >>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >>> *** and potentially your MPI job) >>> -------------------------------------------------------------------------- >>> >>> It looks like MPI_INIT failed for some reason; your parallel process is >>> likely to abort. There are many reasons that a parallel process can >>> fail during MPI_INIT; some of which are due to configuration or >>> environment >>> problems. This failure appears to be an internal failure; here's some >>> additional information (which may only be relevant to an Open MPI >>> developer): >>> >>> ompi_dpm_dyn_init() failed >>> --> Returned "Timeout" (-15) instead of "Success" (0) >>> -------------------------------------------------------------------------- >>> >>> >>> The interesting thing is that there is no error when I am firstly >>> allocating nodes with salloc and then run my program. So, I noticed that >>> the program works fine using openmpi 1.x+sbach/salloc or openmpi >>> 2.0.x+salloc but not openmpi 2.0.x+sbatch. >>> >>> The error was reproduced on three different computer clusters. >>> >>> Best regards, >>> Anastasia >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users