I'm trying to use MPI_Comm_spawn_multiple and it doesn't seem to always work 
like I'd expect.

The simple test code I have starts a couple of master processes and then tries 
to spawn a couple of worker threads on each of the nodes running the master 
processes.

I was using 1.5.1, but gave 1.5.2rc2 a try too.
 
If I do:
[skouson@cu2n29 mpi2_example]$ mpirun -hostfile hostfile -n 2 -bynode 
./mpi2_manager
MPI Initialized=0, Finalized=0
MPI Initialized=0, Finalized=0
I'm manager 0 of 2 on cu2n29 running MPI 2.1
setting up host cu2n29 - ./mpi2_worker
setting up host cu2n30 - ./mpi2_worker
Spawning 2 worker processes running ./mpi2_worker
Sleeping for a bit...
I'm manager 1 of 2 on cu2n30 running MPI 2.1
**** I'm worker 0 of 2 on cu2n29 running MPI 2.1
**** Worker 0: number of parents = 2
**** Worker 0: Success!
**** I'm worker 1 of 2 on cu2n30 running MPI 2.1
**** Worker 1: number of parents = 2
**** Worker 1: Success!
**** Worker 0: Value recd = 25
1: MPI Initialized=1, Finalized=1
0: MPI Initialized=1, Finalized=1

It seems to work as expected, however, if I use -loadbalance or -npernode 1 
rather than the -bynode flag I get an obscure error and things hang until a 
ctrl-c out of it.

[skouson@cu2n29 mpi2_example]$ mpirun -hostfile hostfile -n 2 -loadbalance 
./mpi2_manager
MPI Initialized=0, Finalized=0
MPI Initialized=0, Finalized=0
I'm manager 1 of 2 on cu2n30 running MPI 2.1
I'm manager 0 of 2 on cu2n29 running MPI 2.1
setting up host cu2n29 - ./mpi2_worker
setting up host cu2n30 - ./mpi2_worker
Spawning 2 worker processes running ./mpi2_worker
Sleeping for a bit...
[cu2n29:03088] [[62875,0],0] ORTE_ERROR_LOG: Not found in file 
base/odls_base_default_fns.c at line 906
mpirun: abort is already in progress...hit ctrl-c again to forcibly terminate

My environment has:
OMPI_MCA_btl_openib_ib_retry_count=7
OMPI_MCA_mpi_keep_peer_hostnames=1
OMPI_MCA_btl_openib_ib_timeout=31

I've included the sample code, along with config.log etc.

If anyone has any can point out what I'm missing to be able to run with the 
-loadbalance flag, I'd appreciate it.

-----
Gary Skouson

Attachment: mpi2_example.tar.bz2
Description: mpi2_example.tar.bz2

Reply via email to