Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to give it a try
Sent from my iPad > On Feb 15, 2017, at 1:14 PM, Jason Maldonis <maldo...@wisc.edu> wrote: > > Just to throw this out there -- to me, that doesn't seem to be just a problem > with SLURM. I'm guessing the exact same error would be thrown interactively > (unless I didn't read the above messages carefully enough). I had a lot of > problems running spawned jobs on 2.0.x a few months ago, so I switched back > to 1.10.2 and everything worked. Just in case that helps someone. > > Jason > >> On Wed, Feb 15, 2017 at 1:09 PM, Anastasia Kruchinina >> <nastja.kruchin...@gmail.com> wrote: >> Hi! >> >> I am doing like this: >> >> sbatch -N 2 -n 5 ./job.sh >> >> where job.sh is: >> >> #!/bin/bash -l >> module load openmpi/2.0.1-icc >> mpirun -np 1 ./manager 4 >> >> >> >> >> >> >> >>> On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote: >>> The cmd line looks fine - when you do your “sbatch” request, what is in the >>> shell script you give it? Or are you saying you just “sbatch” the mpirun >>> cmd directly? >>> >>> >>>> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina >>>> <nastja.kruchin...@gmail.com> wrote: >>>> >>>> Hi, >>>> >>>> I am running like this: >>>> mpirun -np 1 ./manager >>>> >>>> Should I do it differently? >>>> >>>> I also thought that all sbatch does is create an allocation and then run >>>> my script in it. But it seems it is not since I am getting these results... >>>> >>>> I would like to upgrade to OpenMPI, but no clusters near me have it yet :( >>>> So I even cannot check if it works with OpenMPI 2.0.2. >>>> >>>>> On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com> >>>>> wrote: >>>>> Hi Anastasia, >>>>> >>>>> Definitely check the mpirun when in batch environment but you may also >>>>> want to upgrade to Open MPI 2.0.2. >>>>> >>>>> Howard >>>>> >>>>> r...@open-mpi.org <r...@open-mpi.org> schrieb am Mi. 15. Feb. 2017 um >>>>> 07:49: >>>>>> Nothing immediate comes to mind - all sbatch does is create an >>>>>> allocation and then run your script in it. Perhaps your script is using >>>>>> a different “mpirun” command than when you type it interactively? >>>>>> >>>>>>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina >>>>>>> <nastja.kruchin...@gmail.com> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I am trying to use MPI_Comm_spawn function in my code. I am having >>>>>>> trouble with openmpi 2.0.x + sbatch (batch system Slurm). >>>>>>> My test program is located here: >>>>>>> http://user.it.uu.se/~anakr367/files/MPI_test/ >>>>>>> >>>>>>> When I am running my code I am getting an error: >>>>>>> >>>>>>> OPAL ERROR: Timeout in file >>>>>>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line >>>>>>> 193 >>>>>>> *** An error occurred in MPI_Init_thread >>>>>>> *** on a NULL communicator >>>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now >>>>>>> abort, >>>>>>> *** and potentially your MPI job) >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> It looks like MPI_INIT failed for some reason; your parallel process is >>>>>>> likely to abort. There are many reasons that a parallel process can >>>>>>> fail during MPI_INIT; some of which are due to configuration or >>>>>>> environment >>>>>>> problems. This failure appears to be an internal failure; here's some >>>>>>> additional information (which may only be relevant to an Open MPI >>>>>>> developer): >>>>>>> >>>>>>> ompi_dpm_dyn_init() failed >>>>>>> --> Returned "Timeout" (-15) instead of "Success" (0) >>>>>>> -------------------------------------------------------------------------- >>>>>>> >>>>>>> >>>>>>> The interesting thing is that there is no error when I am firstly >>>>>>> allocating nodes with salloc and then run my program. So, I noticed >>>>>>> that the program works fine using openmpi 1.x+sbach/salloc or openmpi >>>>>>> 2.0.x+salloc but not openmpi 2.0.x+sbatch. >>>>>>> >>>>>>> The error was reproduced on three different computer clusters. >>>>>>> >>>>>>> Best regards, >>>>>>> Anastasia >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> users@lists.open-mpi.org >>>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@lists.open-mpi.org >>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> users@lists.open-mpi.org >>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> users@lists.open-mpi.org >>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >>> >>> >>> _______________________________________________ >>> users mailing list >>> users@lists.open-mpi.org >>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users