Just to throw this out there -- to me, that doesn't seem to be just a problem with SLURM. I'm guessing the exact same error would be thrown interactively (unless I didn't read the above messages carefully enough). I had a lot of problems running spawned jobs on 2.0.x a few months ago, so I switched back to 1.10.2 and everything worked. Just in case that helps someone.
Jason On Wed, Feb 15, 2017 at 1:09 PM, Anastasia Kruchinina < nastja.kruchin...@gmail.com> wrote: > Hi! > > I am doing like this: > > sbatch -N 2 -n 5 ./job.sh > > where job.sh is: > > #!/bin/bash -l > module load openmpi/2.0.1-icc > mpirun -np 1 ./manager 4 > > > > > > > > On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote: > >> The cmd line looks fine - when you do your “sbatch” request, what is in >> the shell script you give it? Or are you saying you just “sbatch” the >> mpirun cmd directly? >> >> >> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina < >> nastja.kruchin...@gmail.com> wrote: >> >> Hi, >> >> I am running like this: >> mpirun -np 1 ./manager >> >> Should I do it differently? >> >> I also thought that all sbatch does is create an allocation and then run >> my script in it. But it seems it is not since I am getting these results... >> >> I would like to upgrade to OpenMPI, but no clusters near me have it yet >> :( So I even cannot check if it works with OpenMPI 2.0.2. >> >> On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com> >> wrote: >> >>> Hi Anastasia, >>> >>> Definitely check the mpirun when in batch environment but you may also >>> want to upgrade to Open MPI 2.0.2. >>> >>> Howard >>> >>> r...@open-mpi.org <r...@open-mpi.org> schrieb am Mi. 15. Feb. 2017 um >>> 07:49: >>> >>>> Nothing immediate comes to mind - all sbatch does is create an >>>> allocation and then run your script in it. Perhaps your script is using a >>>> different “mpirun” command than when you type it interactively? >>>> >>>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina < >>>> nastja.kruchin...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> I am trying to use MPI_Comm_spawn function in my code. I am having >>>> trouble with openmpi 2.0.x + sbatch (batch system Slurm). >>>> My test program is located here: http://user.it.uu.se/~anakr367 >>>> /files/MPI_test/ >>>> >>>> When I am running my code I am getting an error: >>>> >>>> OPAL ERROR: Timeout in file >>>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line >>>> 193 >>>> *** An error occurred in MPI_Init_thread >>>> *** on a NULL communicator >>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now >>>> abort, >>>> *** and potentially your MPI job) >>>> -------------------------------------------------------------------------- >>>> >>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>> likely to abort. There are many reasons that a parallel process can >>>> fail during MPI_INIT; some of which are due to configuration or >>>> environment >>>> problems. This failure appears to be an internal failure; here's some >>>> additional information (which may only be relevant to an Open MPI >>>> developer): >>>> >>>> ompi_dpm_dyn_init() failed >>>> --> Returned "Timeout" (-15) instead of "Success" (0) >>>> -------------------------------------------------------------------------- >>>> >>>> >>>> The interesting thing is that there is no error when I am firstly >>>> allocating nodes with salloc and then run my program. So, I noticed that >>>> the program works fine using openmpi 1.x+sbach/salloc or openmpi >>>> 2.0.x+salloc but not openmpi 2.0.x+sbatch. >>>> >>>> The error was reproduced on three different computer clusters. >>>> >>>> Best regards, >>>> Anastasia >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users