On Jun 4, 2009, at 3:53 AM, Lars Andersson wrote:

In my second test, I simply put a sleep(3) at point 2), and expected
the MPI_Wait() call at 3) to finish almost instantly, since I assumed
that the message would have been transferred during the sleep. To my
disappointment tough, it took more or less the same time to finish the
MPI_Wait as without any sleep.


As you found by googling, and as Bogdan infers, Open MPI doesn't currently make much progress over TCP-based networks "in the background." And you're right that putting an MPI_WAIT in a progress thread would cause that thread to spin heavily, effectively taking much of your CPU cycles away from you, and possibly even having other bad effects (e.g., cache thrashing, context switching, etc.).

I'd say that your own workaround here is to intersperse MPI_TEST's periodically. This will trigger OMPI's pipelined protocol for large messages, and should allow partial bursts of progress while you're assumedly off doing useful work. If this is difficult because the work is being done in library code that you can't change, then perhaps a pre-spawned "work" through could be used to call MPI_TEST periodically. That way, it won't steal huge ammounts of CPU cycles (like MPI_WAIT would). You still might get some cache thrashing, context switching, etc. -- YMMV.

As for exactly how many / how often you should call MPI_TEST, that is going to be up to you. It's going to depend on a lot of factors -- how big the message is, how well synchronized you are with the receiver, what strategy you use to call MPI_TEST, etc.

Open MPI may someday treat this better to either have a blocking form of MPI_WAIT (i.e., not spinning, or spinning considerably less) or have true TCP progress in the background. But if I had to guess, I'd say that we'll likely do the former before the latter.

--
Jeff Squyres
Cisco Systems

Reply via email to