Just to clarify: the test code I wrote does *not* use MPI_Comm_spawn in the mpirun case. The problem may or may not exist under miprun.

Ralph Castain wrote:
As your own tests have shown, it works fine if you just "mpirun -n 1 ./spawner". It is only singleton comm_spawn that appears to be having a problem in the latest 1.2 release. So I don't think comm_spawn is "useless". ;-)

I'm checking this morning to ensure that singletons properly spawns on other nodes in the 1.3 release. I sincerely doubt we will backport a fix to 1.2.

On Jul 30, 2008, at 6:49 AM, Mark Borgerding wrote:

I keep checking my email in hopes that someone will come up with something that Matt or I might've missed. I'm just having a hard time accepting that something so fundamental would be so broken. The MPI_Comm_spawn command is essentially useless without the ability to spawn processes on other nodes.

If this is true, then my personal scorecard reads:
  # Days spent using openmpi: 4 (off and on)
  # identified bugs in openmpi :2
  # useful programs built: 0

Please prove me wrong. I'm eager to be shown my ignorance -- to find out where I've been stupid and what documentation I should've read.

Matt Hughes wrote:
I've found that I always have to use mpirun to start my spawner
process, due to the exact problem you are having:  the need to give
OMPI a hosts file!  It seems the singleton functionality is lacking
somehow... it won't allow you to spawn on arbitrary hosts.  I have not
tested if this is fixed in the 1.3 series.

mpiexec -np 1 -H op2-1,op2-2 spawner op2-2

mpiexec should start the first process on op2-1, and the spawn call
should start the second on op2-2.  If you don't use the Info object to
set the hostname specifically, then on 1.2.x it will automatically
start on op2-2.  With 1.3, the spawn call will start processes
starting with the first item in the host list.


users mailing list

users mailing list

Reply via email to