Hello all. Doesn't anyone a clue about this issue? Thanks.

---------- Forwarded message ---------
From: L Lutret <lu.lut...@gmail.com>
Date: Tue, Oct 8, 2019 at 12:59 PM
Subject: Dynamic process allocation hangs
To: <users@lists.open-mpi.org>


Hello all. Im started some test with Openmpi 4.0.1. I have two machines,
one local, the other remote. I have used ssh connection. Some basic test
(hello.c script) runs ok local and remote with mpirun. But I need to run a
script without mpirun and generate with spawn some processes. Here some
examples that what I get.


My hostfile:

cat hostfile


    localhost slots=4

    slave1 slots=4


If I set this:


    MPI_Info_set( info, "add-hostfile", "hostfile" );

    MPI_Info_set( info, "npernode", "3" );


And I run 6 processes (i.e. MPI_Comm_spawn() receives 6 procceses to run):


     ./dyamic.o


Its Runs Ok: 4 procceses local and 3 remote


Now, If I set (without add-hostfile and npernode):


      MPI_Info_set( info, "add-host", "slave1,slave1,slave1,slave1" );


And I run 4 processes... its hangs, but I can see with Top one running
processes on local and 4 on remote (slave1), that I think Its ok however.
After a while It throws this:


“A request has timed out and will therefore fail:


Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345


Your job may terminate as a result of this problem. You may want to

adjust the MCA parameter pmix_server_max_wait and try again. If this

occurred during a connect/accept operation, you can adjust that time

using the pmix_base_exchange_timeout parameter.

--------------------------------------------------------------------------

[master:22881] *** An error occurred in MPI_Comm_spawn

[master:22881] *** reported by process [63766529,0]

[master:22881] *** on communicator MPI_COMM_WORLD

[master:22881] *** MPI_ERR_UNKNOWN: unknown error

[master:22881] *** MPI_ERRORS_ARE_FATAL (processes in this communicator
will now abort,

[master:22881] *** and potentially your MPI job)”


I watch with Top now and there are not any processes running.

I really need this type of allocation. Any help It will be very, very
appreciated.
Thanks in advance.

Reply via email to