Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-17 Thread Ashley Pittman
On Tue, 2009-06-09 at 07:28 -0400, Terry Dontje wrote: > The biggest issue is coming up with a > way to have blocks on the SM btl converted to the system poll call > without requiring a socket write for every packet. For what it's worth you don't need a socket write every (local) packet, all you

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-10 Thread Sylvain Jeaugey
Hi Jeff, Thanks for jumping in. On Tue, 9 Jun 2009, Jeff Squyres wrote: 2. Note that your solution presupposes that one MPI process can detect that the entire job is deadlocked. This is not quite correct. What exactly do you want to detect -- that one process may be imbalanced on its receiv

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Jeff Squyres
On Jun 9, 2009, at 8:31 AM, Jeff Squyres (jsquyres) wrote: 4. Note, too, that opal_progress() doesn't see *all* progress - the openib BTL doesn't use opal_progress to know when OpenFabrics messages arrive, for example. Wait, I lied -- sorry. opal_progress will call the bml progress, which t

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Jeff Squyres
I'll throw in my random $0.02. I'm at the Forum this week, so my latency on replies here will likely be large. 1. Ashley is correct that we shouldn't sleep. A better solution would be to block waiting for something to happen (rather than spin). As Terry mentioned, we pretty much know how

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Sylvain Jeaugey
On Tue, 9 Jun 2009, Ralph Castain wrote: 2. instead of putting things to sleep or even adjusting the loop rate, you might want to consider using the orte_notifier capability and notify the system that the job may be stalled. Or perhaps adding an API to the orte_errmgr framework to notify it th

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Ralph Castain
Couple of other things to help stimulate the thinking: 1. it isn't that OMPI -couldn't- receive a message, but rather that it -didn't- receive a message. This may or may not indicate that there is a problem. Could just be an application that doesn't need to communicate for awhile, as per my exampl

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Ashley Pittman
On Mon, 2009-06-08 at 17:50 +0200, Sylvain Jeaugey wrote: > Principle > = > > opal_progress() ensures the progression of MPI communication. The current > algorithm is a loop calling progress on all registered components. If the > program is blocked, the loop will busy-poll indefinetely.

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Sylvain Jeaugey
I understand your point of view, and mostly share it. I think the biggest point in my example is that sleep occurs only after (I was wrong in my previous e-mail) 10 minutes of inactivity, and this value is fully configurable. I didn't intend to call sleep after 2 seconds. Plus, as said before,

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Ralph Castain
My concern with any form of sleep is with the impact on the proc - since opal_progress might not be running in a separate thread, won't the sleep apply to the process as a whole? In that case, the process isn't free to continue computing. I can envision applications that might call down int

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Terry Dontje
Sylvain Jeaugey wrote: Hi Ralph, I'm entirely convinced that MPI doesn't have to save power in a normal scenario. The idea is just that if an MPI process is blocked (i.e. has not performed progress for -say- 5 minutes (default in my implementation), we stop busy polling and have the process d

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Sylvain Jeaugey
Hi Ralph, I'm entirely convinced that MPI doesn't have to save power in a normal scenario. The idea is just that if an MPI process is blocked (i.e. has not performed progress for -say- 5 minutes (default in my implementation), we stop busy polling and have the process drop from 100% CPU usage

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-08 Thread Ralph Castain
I'm not entirely convinced this actually achieves your goals, but I can see some potential benefits. I'm also not sure that power consumption is that big of an issue that MPI needs to begin chasing "power saver" modes of operation, but that can be a separate debate some day. I'm assuming

[OMPI devel] [RFC] Low pressure OPAL progress

2009-06-08 Thread Sylvain Jeaugey
What : when nothing has been received for a very long time - e.g. 5 minutes, stop busy polling in opal_progress and switch to a usleep-based one. Why : when we have long waits, and especially when an application is deadlock'ed, detecting it is not easy and a lot of power is wasted until the e