Ok, thanks for your answers! I was not aware that it is a known issue.

I guess I will just try to find a machine with OpenMPI/2.0.2 and try there.

On 16 February 2017 at 00:01, r...@open-mpi.org <r...@open-mpi.org> wrote:

> Yes, 2.0.1 has a spawn issue. We believe that 2.0.2 is okay if you want to
> give it a try
>
> Sent from my iPad
>
> On Feb 15, 2017, at 1:14 PM, Jason Maldonis <maldo...@wisc.edu> wrote:
>
> Just to throw this out there -- to me, that doesn't seem to be just a
> problem with SLURM. I'm guessing the exact same error would be thrown
> interactively (unless I didn't read the above messages carefully enough).
> I had a lot of problems running spawned jobs on 2.0.x a few months ago, so
> I switched back to 1.10.2 and everything worked. Just in case that helps
> someone.
>
> Jason
>
> On Wed, Feb 15, 2017 at 1:09 PM, Anastasia Kruchinina <
> nastja.kruchin...@gmail.com> wrote:
>
>> Hi!
>>
>> I am doing like this:
>>
>> sbatch  -N 2 -n 5 ./job.sh
>>
>> where job.sh is:
>>
>> #!/bin/bash -l
>> module load openmpi/2.0.1-icc
>> mpirun -np 1 ./manager 4
>>
>>
>>
>>
>>
>>
>>
>> On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote:
>>
>>> The cmd line looks fine - when you do your “sbatch” request, what is in
>>> the shell script you give it? Or are you saying you just “sbatch” the
>>> mpirun cmd directly?
>>>
>>>
>>> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina <
>>> nastja.kruchin...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I am running like this:
>>> mpirun -np 1 ./manager
>>>
>>> Should I do it differently?
>>>
>>> I also thought that all sbatch does is create an allocation and then run
>>> my script in it. But it seems it is not since I am getting these results...
>>>
>>> I would like to upgrade to OpenMPI, but no clusters near me have it yet
>>> :( So I even cannot check if it works with OpenMPI 2.0.2.
>>>
>>> On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com>
>>> wrote:
>>>
>>>> Hi Anastasia,
>>>>
>>>> Definitely check the mpirun when in batch environment but you may also
>>>> want to upgrade to Open MPI 2.0.2.
>>>>
>>>> Howard
>>>>
>>>> r...@open-mpi.org <r...@open-mpi.org> schrieb am Mi. 15. Feb. 2017 um
>>>> 07:49:
>>>>
>>>>> Nothing immediate comes to mind - all sbatch does is create an
>>>>> allocation and then run your script in it. Perhaps your script is using a
>>>>> different “mpirun” command than when you type it interactively?
>>>>>
>>>>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina <
>>>>> nastja.kruchin...@gmail.com> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to use MPI_Comm_spawn function in my code. I am having
>>>>> trouble with openmpi 2.0.x + sbatch (batch system Slurm).
>>>>> My test program is located here: http://user.it.uu.se/~anakr367
>>>>> /files/MPI_test/
>>>>>
>>>>> When I am running my code I am getting an error:
>>>>>
>>>>> OPAL ERROR: Timeout in file
>>>>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line
>>>>> 193
>>>>> *** An error occurred in MPI_Init_thread
>>>>> *** on a NULL communicator
>>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
>>>>> abort,
>>>>> ***    and potentially your MPI job)
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> It looks like MPI_INIT failed for some reason; your parallel process
>>>>> is
>>>>> likely to abort.  There are many reasons that a parallel process can
>>>>> fail during MPI_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems.  This failure appears to be an internal failure; here's some
>>>>> additional information (which may only be relevant to an Open MPI
>>>>> developer):
>>>>>
>>>>>    ompi_dpm_dyn_init() failed
>>>>>    --> Returned "Timeout" (-15) instead of "Success" (0)
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>>
>>>>> The interesting thing is that there is no error when I am firstly
>>>>> allocating nodes with salloc and then run my program. So, I noticed that
>>>>> the program works fine using openmpi 1.x+sbach/salloc or openmpi
>>>>> 2.0.x+salloc but not openmpi 2.0.x+sbatch.
>>>>>
>>>>> The error was reproduced on three different computer clusters.
>>>>>
>>>>> Best regards,
>>>>> Anastasia
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users@lists.open-mpi.org
>>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to