Hmmm...disturbing. The changes I made have somehow been lost. I'll have to redo 
it - will get back to you when it is restored.


On Mar 25, 2021, at 2:54 PM, L Lutret <lu.lut...@gmail.com 
<mailto:lu.lut...@gmail.com> > wrote:

Hi Ralph,

Thanks for your response. I tried with the master branch a very simple spawn 
from a singleton, in three ways: 

a) running with a worker host with the add-host key in MPI_Info_set()
b) running with a worker host with the new PMIX_ADD_HOSTFILE key in 
MPI_Info_set()
c) running just in localhost  -i.e. MPI_Comm_spawn( ... MPI_INFO_NULL ...)-

but the output is the same; this error message:

prte: Error: unknown option "--singleton"
Type 'prte --help' for usage.
[osboxes:06532] OPAL ERROR: Error in file dpm/dpm.c at line 2168
[osboxes:00000] *** An error occurred in MPI_Comm_spawn
[osboxes:00000] *** reported by process [1835008000,0]
[osboxes:00000] *** on communicator MPI_COMM_SELF
[osboxes:00000] *** MPI_ERR_UNKNOWN: unknown error
[osboxes:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,
[osboxes:00000] ***    and MPI will try to terminate your MPI job as well)

Thank you very much for your help. 
Regards.

 
On Wed, Mar 24, 2021 at 10:07 AM Ralph Castain <r...@open-mpi.org 
<mailto:r...@open-mpi.org> > wrote:
Apologies for the very long delay in response. This has been verified fixed in 
OMPI's master branch that is to be released as v5.0 in the near future. 
Unfortunately, there are no plans to backport that fi to earlier release 
series. We therefore recommend that you upgrade to v5.0 if you retain interest 
in this feature.

Again, our apologies for the delayed response. You are welcome to use the 
nightly tarballs (https://www.open-mpi.org/nightly/v5.0.x/ or 
https://www.open-mpi.org/nightly/master/) in the interim, and please do let us 
know if the problem persists.
Ralph


On Dec 13, 2019, at 11:34 AM, L Lutret via users <users@lists.open-mpi.org 
<mailto:users@lists.open-mpi.org> > wrote:

Hello all. Doesn't anyone a clue about this issue? Thanks.

---------- Forwarded message ---------
From: L Lutret <lu.lut...@gmail.com <mailto:lu.lut...@gmail.com> >
Date: Tue, Oct 8, 2019 at 12:59 PM
Subject: Dynamic process allocation hangs
To: <users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >


Hello all. Im started some test with Openmpi 4.0.1. I have two machines, one 
local, the other remote. I have used ssh connection. Some basic test (hello.c 
script) runs ok local and remote with mpirun. But I need to run a script 
without mpirun and generate with spawn some processes. Here some examples that 
what I get.



My hostfile:

cat hostfile



    localhost slots=4

    slave1 slots=4



If I set this: 



    MPI_Info_set( info, "add-hostfile", "hostfile" );

    MPI_Info_set( info, "npernode", "3" );



And I run 6 processes (i.e. MPI_Comm_spawn() receives 6 procceses to run):



     ./dyamic.o



Its Runs Ok: 4 procceses local and 3 remote



Now, If I set (without add-hostfile and npernode):



      MPI_Info_set( info, "add-host", "slave1,slave1,slave1,slave1" );



And I run 4 processes... its hangs, but I can see with Top one running 
processes on local and 4 on remote (slave1), that I think Its ok however. After 
a while It throws this: 



“A request has timed out and will therefore fail:



 Operation: LOOKUP: orted/pmix/pmix_server_pub.c:345



Your job may terminate as a result of this problem. You may want to

adjust the MCA parameter pmix_server_max_wait and try again. If this

occurred during a connect/accept operation, you can adjust that time

using the pmix_base_exchange_timeout parameter.

--------------------------------------------------------------------------

[master:22881] *** An error occurred in MPI_Comm_spawn

[master:22881] *** reported by process [63766529,0]

[master:22881] *** on communicator MPI_COMM_WORLD

[master:22881] *** MPI_ERR_UNKNOWN: unknown error

[master:22881] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will 
now abort,

[master:22881] *** and potentially your MPI job)”



I watch with Top now and there are not any processes running.

I really need this type of allocation. Any help It will be very, very 
appreciated. Thanks in advance.



Reply via email to