Hi!

I am doing like this:

sbatch  -N 2 -n 5 ./job.sh

where job.sh is:

#!/bin/bash -l
module load openmpi/2.0.1-icc
mpirun -np 1 ./manager 4







On 15 February 2017 at 17:58, r...@open-mpi.org <r...@open-mpi.org> wrote:

> The cmd line looks fine - when you do your “sbatch” request, what is in
> the shell script you give it? Or are you saying you just “sbatch” the
> mpirun cmd directly?
>
>
> On Feb 15, 2017, at 8:07 AM, Anastasia Kruchinina <
> nastja.kruchin...@gmail.com> wrote:
>
> Hi,
>
> I am running like this:
> mpirun -np 1 ./manager
>
> Should I do it differently?
>
> I also thought that all sbatch does is create an allocation and then run
> my script in it. But it seems it is not since I am getting these results...
>
> I would like to upgrade to OpenMPI, but no clusters near me have it yet :(
> So I even cannot check if it works with OpenMPI 2.0.2.
>
> On 15 February 2017 at 16:04, Howard Pritchard <hpprit...@gmail.com>
> wrote:
>
>> Hi Anastasia,
>>
>> Definitely check the mpirun when in batch environment but you may also
>> want to upgrade to Open MPI 2.0.2.
>>
>> Howard
>>
>> r...@open-mpi.org <r...@open-mpi.org> schrieb am Mi. 15. Feb. 2017 um
>> 07:49:
>>
>>> Nothing immediate comes to mind - all sbatch does is create an
>>> allocation and then run your script in it. Perhaps your script is using a
>>> different “mpirun” command than when you type it interactively?
>>>
>>> On Feb 14, 2017, at 5:11 AM, Anastasia Kruchinina <
>>> nastja.kruchin...@gmail.com> wrote:
>>>
>>> Hi,
>>>
>>> I am trying to use MPI_Comm_spawn function in my code. I am having
>>> trouble with openmpi 2.0.x + sbatch (batch system Slurm).
>>> My test program is located here: http://user.it.uu.se/~anakr367
>>> /files/MPI_test/
>>>
>>> When I am running my code I am getting an error:
>>>
>>> OPAL ERROR: Timeout in file
>>> ../../../../openmpi-2.0.1/opal/mca/pmix/base/pmix_base_fns.c at line
>>> 193
>>> *** An error occurred in MPI_Init_thread
>>> *** on a NULL communicator
>>> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
>>> ***    and potentially your MPI job)
>>> --------------------------------------------------------------------------
>>>
>>> It looks like MPI_INIT failed for some reason; your parallel process is
>>> likely to abort.  There are many reasons that a parallel process can
>>> fail during MPI_INIT; some of which are due to configuration or
>>> environment
>>> problems.  This failure appears to be an internal failure; here's some
>>> additional information (which may only be relevant to an Open MPI
>>> developer):
>>>
>>>    ompi_dpm_dyn_init() failed
>>>    --> Returned "Timeout" (-15) instead of "Success" (0)
>>> --------------------------------------------------------------------------
>>>
>>>
>>> The interesting thing is that there is no error when I am firstly
>>> allocating nodes with salloc and then run my program. So, I noticed that
>>> the program works fine using openmpi 1.x+sbach/salloc or openmpi
>>> 2.0.x+salloc but not openmpi 2.0.x+sbatch.
>>>
>>> The error was reproduced on three different computer clusters.
>>>
>>> Best regards,
>>> Anastasia
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users@lists.open-mpi.org
>>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to