Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Sylvain Jeaugey
Hi Ralph, I'm entirely convinced that MPI doesn't have to save power in a normal scenario. The idea is just that if an MPI process is blocked (i.e. has not performed progress for -say- 5 minutes (default in my implementation), we stop busy polling and have the process drop from 100% CPU usage

Re: [OMPI devel] Multi-rail on openib

2009-06-09 Thread Pavel Shamis (Pasha)
Most of the IB protocols used by MPI target a LID. There is no existing notification path I know of that can replace LID-xyz with LID-123. The subnet manager might be able to do this but begs security issues. Interesting problem. It is not exactly correct. For migration between port

Re: [OMPI devel] Multi-rail on openib

2009-06-09 Thread Sylvain Jeaugey
On Mon, 8 Jun 2009, NiftyOMPI Tom Mitchell wrote: ??? dual rail does double the number of switch ports. If you want to address switch failure each rail must connect to a different switch. If you do not want to have isolated fabrics you must have some additional ports on all switches to connect

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Terry Dontje
Sylvain Jeaugey wrote: Hi Ralph, I'm entirely convinced that MPI doesn't have to save power in a normal scenario. The idea is just that if an MPI process is blocked (i.e. has not performed progress for -say- 5 minutes (default in my implementation), we stop busy polling and have the process d

[OMPI devel] Fwd: [Open MPI] #1927: v1.3 COMM_SPAWN loop test fails after ~120 spawns

2009-06-09 Thread Jeff Squyres
I'd be in favor of bringing this to v1.3. Are there other dependencies / would it be difficult? Begin forwarded message: From: "Open MPI" Date: June 8, 2009 11:31:20 AM PDT Cc: Subject: Re: [Open MPI] #1927: v1.3 COMM_SPAWN loop test fails after ~120 spawns #1927: v1.3 COMM_SPAWN loop

Re: [OMPI devel] Fwd: [Open MPI] #1927: v1.3 COMM_SPAWN loop test fails after ~120 spawns

2009-06-09 Thread Ralph Castain
I don't think it would be very hard - I would have to create a patch for it, but the fix is completely contained in one file and location. I would like to have someone else test it, though, before we move it across. It worked for me, but since it is a race condition, that isn't entirely con

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Ralph Castain
My concern with any form of sleep is with the impact on the proc - since opal_progress might not be running in a separate thread, won't the sleep apply to the process as a whole? In that case, the process isn't free to continue computing. I can envision applications that might call down int

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Sylvain Jeaugey
I understand your point of view, and mostly share it. I think the biggest point in my example is that sleep occurs only after (I was wrong in my previous e-mail) 10 minutes of inactivity, and this value is fully configurable. I didn't intend to call sleep after 2 seconds. Plus, as said before,

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Ashley Pittman
On Mon, 2009-06-08 at 17:50 +0200, Sylvain Jeaugey wrote: > Principle > = > > opal_progress() ensures the progression of MPI communication. The current > algorithm is a loop calling progress on all registered components. If the > program is blocked, the loop will busy-poll indefinetely.

Re: [OMPI devel] Fwd: [Open MPI] #1927: v1.3 COMM_SPAWN loop testfails after ~120 spawns

2009-06-09 Thread Jeff Squyres
Tested -- seem to work for me. I say we now let MTT sort it out (i.e., see if others hit this race condition) and apply to v1.3. On Jun 9, 2009, at 4:46 AM, Ralph Castain wrote: I don't think it would be very hard - I would have to create a patch for it, but the fix is completely contained i

Re: [OMPI devel] Multi-rail on openib

2009-06-09 Thread Pavel Shamis (Pasha)
Open MPI currently needs to have connected fabrics, but maybe that's something we will like to change in the future, having two separate rails. (Btw Pasha, will your current work enable this ?) I do not completely understand what do you mean here under two separate rails ... Already today you

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Ralph Castain
Couple of other things to help stimulate the thinking: 1. it isn't that OMPI -couldn't- receive a message, but rather that it -didn't- receive a message. This may or may not indicate that there is a problem. Could just be an application that doesn't need to communicate for awhile, as per my exampl

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Sylvain Jeaugey
On Tue, 9 Jun 2009, Ralph Castain wrote: 2. instead of putting things to sleep or even adjusting the loop rate, you might want to consider using the orte_notifier capability and notify the system that the job may be stalled. Or perhaps adding an API to the orte_errmgr framework to notify it th

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Jeff Squyres
I'll throw in my random $0.02. I'm at the Forum this week, so my latency on replies here will likely be large. 1. Ashley is correct that we shouldn't sleep. A better solution would be to block waiting for something to happen (rather than spin). As Terry mentioned, we pretty much know how

Re: [OMPI devel] [RFC] Low pressure OPAL progress

2009-06-09 Thread Jeff Squyres
On Jun 9, 2009, at 8:31 AM, Jeff Squyres (jsquyres) wrote: 4. Note, too, that opal_progress() doesn't see *all* progress - the openib BTL doesn't use opal_progress to know when OpenFabrics messages arrive, for example. Wait, I lied -- sorry. opal_progress will call the bml progress, which t

[OMPI devel] Hang in collectives involving shared memory

2009-06-09 Thread Ralph Castain
Hi folks As mentioned in today's telecon, we at LANL are continuing to see hangs when running even small jobs that involve shared memory in collective operations. This has been the topic of discussion before, but I bring it up again because (a) the problem is beginning to become epidemic across ou