On Tue, 2009-06-09 at 07:28 -0400, Terry Dontje wrote:
> The biggest issue is coming up with a
> way to have blocks on the SM btl converted to the system poll call
> without requiring a socket write for every packet.
For what it's worth you don't need a socket write every (local) packet,
all you
Hi Jeff,
Thanks for jumping in.
On Tue, 9 Jun 2009, Jeff Squyres wrote:
2. Note that your solution presupposes that one MPI process can detect that
the entire job is deadlocked. This is not quite correct. What exactly do
you want to detect -- that one process may be imbalanced on its receiv
On Jun 9, 2009, at 8:31 AM, Jeff Squyres (jsquyres) wrote:
4. Note, too, that opal_progress() doesn't see *all* progress - the
openib BTL doesn't use opal_progress to know when OpenFabrics messages
arrive, for example.
Wait, I lied -- sorry.
opal_progress will call the bml progress, which t
I'll throw in my random $0.02. I'm at the Forum this week, so my
latency on replies here will likely be large.
1. Ashley is correct that we shouldn't sleep. A better solution would
be to block waiting for something to happen (rather than spin). As
Terry mentioned, we pretty much know how
On Tue, 9 Jun 2009, Ralph Castain wrote:
2. instead of putting things to sleep or even adjusting the loop rate, you
might want to consider using the orte_notifier
capability and notify the system that the job may be stalled. Or perhaps adding
an API to the orte_errmgr framework to
notify it th
Couple of other things to help stimulate the thinking:
1. it isn't that OMPI -couldn't- receive a message, but rather that it
-didn't- receive a message. This may or may not indicate that there is a
problem. Could just be an application that doesn't need to communicate for
awhile, as per my exampl
On Mon, 2009-06-08 at 17:50 +0200, Sylvain Jeaugey wrote:
> Principle
> =
>
> opal_progress() ensures the progression of MPI communication. The current
> algorithm is a loop calling progress on all registered components. If the
> program is blocked, the loop will busy-poll indefinetely.
I understand your point of view, and mostly share it.
I think the biggest point in my example is that sleep occurs only after (I
was wrong in my previous e-mail) 10 minutes of inactivity, and this value
is fully configurable. I didn't intend to call sleep after 2 seconds.
Plus, as said before,
My concern with any form of sleep is with the impact on the proc -
since opal_progress might not be running in a separate thread, won't
the sleep apply to the process as a whole? In that case, the process
isn't free to continue computing.
I can envision applications that might call down int
Sylvain Jeaugey wrote:
Hi Ralph,
I'm entirely convinced that MPI doesn't have to save power in a normal
scenario. The idea is just that if an MPI process is blocked (i.e. has
not performed progress for -say- 5 minutes (default in my
implementation), we stop busy polling and have the process d
Hi Ralph,
I'm entirely convinced that MPI doesn't have to save power in a normal
scenario. The idea is just that if an MPI process is blocked (i.e. has not
performed progress for -say- 5 minutes (default in my implementation), we
stop busy polling and have the process drop from 100% CPU usage
I'm not entirely convinced this actually achieves your goals, but I
can see some potential benefits. I'm also not sure that power
consumption is that big of an issue that MPI needs to begin chasing
"power saver" modes of operation, but that can be a separate debate
some day.
I'm assuming
What : when nothing has been received for a very long time - e.g. 5
minutes, stop busy polling in opal_progress and switch to a usleep-based
one.
Why : when we have long waits, and especially when an application is
deadlock'ed, detecting it is not easy and a lot of power is wasted until
the e
13 matches
Mail list logo