Interesting idea.

One obvious solution would be to mpirun your controller tasks and, as you mentioned, use MPI to communicate between them. Then you can use MPI_COMM_SPAWN to launch the actual MPI job that you want to monitor.

However, this will only more-or-less work. OMPI currently polls aggressively to make message passing progress, so if you end up over- subscribing nodes (because you filled up the cores on one node with all the target MPI processes but also have 1 or more controller processes running on the same node), they'll thrash each other and you'll get -- at best -- unreliable/unrepeatable performance fraught with lots of race conditions.

Another issue is that OMPI's MPI_COMM_SPAWN does not give good options to allow specific process placement, so it might be a little dicey to get processes to land exactly where you want them.

Alternatively, you could simply locally fork()/exec() your target process from the controller. But the MPI spec does state that the use of fork() is undefined within an MPI process. Indeed, if you are using a high-speed network such as InfiniBand or Myrinet, calling fork() after you call MPI_INIT, Bad Things(tm) will happen (we can explain more if you care). But if you're only using TCP, you should be fine.

Another option might be to mpirun your target MPI app, have it wait in some kind of local barrier, and then mpirun your controllers on the same machines. The controllers find/attach to your target processes, release them from the local barrier, and then you're good to go -- both your controllers and your target app are fully up and running under MPI. You'll still have the spinning/performance issue, though -- so you won't want to oversubscribe nodes.

Does this help?


On Oct 1, 2007, at 10:49 PM, Oleg Morajko wrote:

Hello,

In the context of my PhD research, I have been developing a run- time performance analyzer for MPI-based applications. My tool provides a controller process for each MPI task. In particular, when a MPI job is started, a special wrapper script is generated that first starts my controller processes and next each controller spawns an actual MPI task (that performs MPI_Init etc.). I use dynamic instrumentation API (DynInst API) to control and instrument MPI tasks.

The point is I need to intercommunicate my controller processes, in particular I need a point-to-point communication between arbitrary pair of controllers. So it seems reasonable to take advantage of MPI itself and use it for communication. However I am not sure what would be the impact of calling MPI_Init and communicating from controller processes taking into account both controllers and actual MPI processes where started with the same mpirun invocation. Actually I would need to assure that controllers have a separate MPI execution enviroment while the application has another one.

Any suggestions how to achive that? Obviously another option is to use sockets to intercommunicate controllers, but having MPI this seems to be overkill.

Thank you in advance for your help.

Regards,
--Oleg

PhD student, Universitat Autonoma de Barcelona, Spain

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to