Just to throw this out there -- to me, that doesn't seem to be just a
problem with SLURM. I'm guessing the exact same error would be thrown
interactively (unless I didn't read the above messages carefully enough).
I had a lot of problems running spawned jobs on 2.0.x a few months ago, so
I switched back to 1.10.2 and everything worked. Just in case that helps
someone.

Jason

On Wed, Feb 15, 2017 at 1:09 PM, Anastasia Kruchinina <
nastja.kruchin...@gmail.com> wrote:

> Hi!
>
> I am doing like this:
>
> sbatch  -N 2 -n 5 ./job.sh
>
> where job.sh is:
>
> #!/bin/bash -l
> module load openmpi/2.0.1-icc
> mpirun -np 1 ./manager 4
>
>
>
>
>
>
>
> On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote:
>
>> The cmd line looks fine - when you do your “sbatch” request, what is in
>> the shell script you give it? Or are you saying you just “sbatch” the
>> mpirun cmd directly?
>>
>>
>> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina <
>> nastja.kruchin...@gmail.com> wrote:
>>
>> Hi,
>>
>> I am running like this:
>> mpirun -np 1 ./manager
>>
>> Should I do it differently?
>>
>> I also thought that all sbatch does is create an allocation and then run
>> my script in it. But it seems it is not since I am getting these results...
>>
>> I would like to upgrade to OpenMPI, but no clusters near me have it yet
>> :( So I even cannot check if it works with OpenMPI 2.0.2.
>>
>> On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com>
>> wrote:
>>
>>> Hi Anastasia,
>>>
>>> Definitely check the mpirun when in batch environment but you may also
>>> want to upgrade to Open MPI 2.0.2.
>>>
>>> Howard
>>>
>>> r...@open-mpi.org <r...@open-mpi.org> schrieb am Mi. 15. Feb. 2017 um
>>> 07:49:
>>>
>>>> Nothing immediate comes to mind - all sbatch does is create an
>>>> allocation and then run your script in it. Perhaps your script is using a
>>>> different “mpirun” command than when you type it interactively?
>>>>
>>>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina <
>>>> nastja.kruchin...@gmail.com> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I am trying to use MPI_Comm_spawn function in my code. I am having
>>>> trouble with openmpi 2.0.x + sbatch (batch system Slurm).
>>>> My test program is located here: http://user.it.uu.se/~anakr367
>>>> /files/MPI_test/
>>>>
>>>> When I am running my code I am getting an error:
>>>>
>>>> OPAL ERROR: Timeout in file
>>>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line
>>>> 193
>>>> *** An error occurred in MPI_Init_thread
>>>> *** on a NULL communicator
>>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now
>>>> abort,
>>>> ***    and potentially your MPI job)
>>>> --------------------------------------------------------------------------
>>>>
>>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>>> likely to abort.  There are many reasons that a parallel process can
>>>> fail during MPI_INIT; some of which are due to configuration or
>>>> environment
>>>> problems.  This failure appears to be an internal failure; here's some
>>>> additional information (which may only be relevant to an Open MPI
>>>> developer):
>>>>
>>>>    ompi_dpm_dyn_init() failed
>>>>    --> Returned "Timeout" (-15) instead of "Success" (0)
>>>> --------------------------------------------------------------------------
>>>>
>>>>
>>>> The interesting thing is that there is no error when I am firstly
>>>> allocating nodes with salloc and then run my program. So, I noticed that
>>>> the program works fine using openmpi 1.x+sbach/salloc or openmpi
>>>> 2.0.x+salloc but not openmpi 2.0.x+sbatch.
>>>>
>>>> The error was reproduced on three different computer clusters.
>>>>
>>>> Best regards,
>>>> Anastasia
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to