Re: [OMPI devel] sm BTL flow management

2009-06-26 Thread George Bosilca
As Terry described and based on the patch attached to the ticket on trac, the extra goto has slipped in the commit by mistake. It belongs to a totally different patch for shared memory I'm working on. I'll remove it. george. On Jun 26, 2009, at 06:52 , Terry Dontje wrote: Eugene Loh wr

Re: [OMPI devel] sm BTL flow management

2009-06-26 Thread Terry Dontje
Eugene Loh wrote: Brian W. Barrett wrote: All - Jeff, Eugene, and I had a long discussion this morning on the sm BTL flow management issues and came to a couple of conclusions. * Jeff, Eugene, and I are all convinced that Eugene's addition of polling the receive queue to drain acks when se

Re: [OMPI devel] sm BTL flow management

2009-06-26 Thread Eugene Loh
Brian W. Barrett wrote: All - Jeff, Eugene, and I had a long discussion this morning on the sm BTL flow management issues and came to a couple of conclusions. * Jeff, Eugene, and I are all convinced that Eugene's addition of polling the receive queue to drain acks when sends start backing u

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Paul H. Hargrove
Brian W. Barrett wrote: All - Jeff, Eugene, and I had a long discussion this morning on the sm BTL flow management issues and came to a couple of conclusions. * Jeff, Eugene, and I are all convinced that Eugene's addition of polling the receive queue to drain acks when sends start backing up

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Brian W. Barrett
On Thu, 25 Jun 2009, Eugene Loh wrote: I spoke with Brian and Jeff about this earlier today. Presumably, up through 1.2, mca_btl_component_progress would poll and if it received a message fragment would return. Then, presumably in 1.3.0, behavior was changed to keep polling until the FIFO wa

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Brian W. Barrett
All - Jeff, Eugene, and I had a long discussion this morning on the sm BTL flow management issues and came to a couple of conclusions. * Jeff, Eugene, and I are all convinced that Eugene's addition of polling the receive queue to drain acks when sends start backing up is required for deadloc

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Eugene Loh
Eugene Loh wrote: If you look in mca_btl_sm_component_progress, when a process receives a message fragment and returns it to the sender, it executes code like this: goto recheck_peer; break; Okay, the reason I show you that code is because a static code checker should easily ident

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Jeff Squyres
FWIW, Ralph and I have generally moved away from hosting hg's on www.open-mpi.org -- we've been using bitbucket.org for hosting public and shared hg repos. It's free to get an account. We love bitbucket.org! :-) On Jun 25, 2009, at 10:23 AM, Eugene Loh wrote: Might be fixed now. Ralph

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Eugene Loh
Might be fixed now. Ralph Castain wrote: Unfortunately, we cannot access this - permissions are denied. In poking around, I found that your hg directory has permission 700. Afraid you'll have to grant us permission to access this. :-/ On Jun 25, 2009, at 1:06 AM, Eugene Loh wrote: Bryan

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Ralph Castain
Unfortunately, we cannot access this - permissions are denied. In poking around, I found that your hg directory has permission 700. Afraid you'll have to grant us permission to access this. :-/ Ralph On Jun 25, 2009, at 1:06 AM, Eugene Loh wrote: Bryan Lally wrote: Ralph Castain wrote:

Re: [OMPI devel] sm BTL flow management

2009-06-25 Thread Eugene Loh
Bryan Lally wrote: Ralph Castain wrote: Be happy to put it through the wringer... :-) My wringer is available, too. 'kay. Try hg clone ssh://www.open-mpi.org/~eloh/hg/pending_sends which is r21498 but with changes to poll one's own FIFO more regularly (e.g., even when just performing s

Re: [OMPI devel] sm BTL flow management

2009-06-24 Thread Bryan Lally
Ralph Castain wrote: Be happy to put it through the wringer... :-) My wringer is available, too. - Bryan -- Bryan Lally, la...@lanl.gov 505.667.9954 CCS-2 Los Alamos National Laboratory Los Alamos, New Mexico

Re: [OMPI devel] sm BTL flow management

2009-06-23 Thread Ralph Castain
Not sure I can address that broader issue, but would you like us to acid test your fix? Be happy to put it through the wringer... :-) Ralph On Jun 23, 2009, at 7:40 PM, Eugene Loh wrote: I have a fix for ticket 1944 working, but the broader problem is unpleasant. E.g., let's say we have z

[OMPI devel] sm BTL flow management

2009-06-23 Thread Eugene Loh
I have a fix for ticket 1944 working, but the broader problem is unpleasant. E.g., let's say we have zillions of uncountered Bcasts or something. Say, the root is repeatedly emitting sends, but never polling its in-coming FIFO. Return fragments will be accumulating, the FIFO will be congeste