On Jul 30, 2008, at 11:12 AM, Mark Borgerding wrote:

I appreciate the suggestion about running a daemon on each of the remote nodes, but wouldn't I kind of be reinventing the wheel there? Process management is one of the things I'd like to be able to count on ORTE for.

Keep in mind that the daemons here are not for process management -- they're for name service.

Would the following work to give the parent process an intercomm with each child?

parent i.e. my non-mpirun-started process calls MPI_Init then MPI_Open_port parent spawns mpirun command via system/exec to create the remote children . The name from MPI_Open_port is placed in the environment.
parent calls MPI_Comm_accept (once for each child?)
all children call MPI_connect to the name

It may be problematic to call system/exec in some environments (e.g., if using OpenFabrics networks). Bad Things can happen.

I think this would give one intercommunicator back to the parent for each remote process (not ideal, but I can worry about broadcast data later) The remote processes can communicate to each other through MPI_COMM_WORLD.


Actually when I think through the details, much of this is pretty similar to the daemon MPI_Publish_name+MPI_Lookup_name approach. The main difference being which processes come first.

Instead of having the framework call MPI_Init in your plugin, can you plugin system/exec "mpirun -np 1 my_parent_app"? And perhaps use a pipe (or socket or some other IPC) to communicate between the framework process and my_parent_app? I realize it's a kludgey workaround, but it looks like we clearly have a bug in the 1.2 series with singletons in this area...

--
Jeff Squyres
Cisco Systems

Reply via email to