The cmd line looks fine - when you do your “sbatch” request, what is in the shell script you give it? Or are you saying you just “sbatch” the mpirun cmd directly?
> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina > <nastja.kruchin...@gmail.com> wrote: > > Hi, > > I am running like this: > mpirun -np 1 ./manager > > Should I do it differently? > > I also thought that all sbatch does is create an allocation and then run my > script in it. But it seems it is not since I am getting these results... > > I would like to upgrade to OpenMPI, but no clusters near me have it yet :( So > I even cannot check if it works with OpenMPI 2.0.2. > > On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com > <mailto:hpprit...@gmail.com>> wrote: > Hi Anastasia, > > Definitely check the mpirun when in batch environment but you may also want > to upgrade to Open MPI 2.0.2. > > Howard > > r...@open-mpi.org <mailto:r...@open-mpi.org> <r...@open-mpi.org > <mailto:r...@open-mpi.org>> schrieb am Mi. 15. Feb. 2017 um 07:49: > Nothing immediate comes to mind - all sbatch does is create an allocation and > then run your script in it. Perhaps your script is using a different “mpirun” > command than when you type it interactively? > >> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina >> <nastja.kruchin...@gmail.com <mailto:nastja.kruchin...@gmail.com>> wrote: >> >> Hi, >> >> I am trying to use MPI_Comm_spawn function in my code. I am having trouble >> with openmpi 2.0.x + sbatch (batch system Slurm). >> My test program is located here: >> http://user.it.uu.se/~anakr367/files/MPI_test/ >> <http://user.it.uu.se/%7Eanakr367/files/MPI_test/> >> >> When I am running my code I am getting an error: >> >> OPAL ERROR: Timeout in file >> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line 193 >> *** An error occurred in MPI_Init_thread >> *** on a NULL communicator >> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, >> *** and potentially your MPI job) >> -------------------------------------------------------------------------- >> It looks like MPI_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during MPI_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open MPI >> developer): >> >> ompi_dpm_dyn_init() failed >> --> Returned "Timeout" (-15) instead of "Success" (0) >> -------------------------------------------------------------------------- >> >> The interesting thing is that there is no error when I am firstly allocating >> nodes with salloc and then run my program. So, I noticed that the program >> works fine using openmpi 1.x+sbach/salloc or openmpi 2.0.x+salloc but not >> openmpi 2.0.x+sbatch. >> >> The error was reproduced on three different computer clusters. >> >> Best regards, >> Anastasia >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> > _______________________________________________ > users mailing list > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users > <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users