Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Eugene Loh
George Bosilca wrote: Here is a simple fix for both problems. Enforce a reasonable limit on the number of fragments in the BTL free list (1K should be more than enough), and make sure the fifo has a size equal to p * number_of_allowed_fragments_in_the_free_list, where p is the number of l

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Eugene Loh
George Bosilca wrote: In other words, as long as a queue is peer based (peer not peers), the management of the pending send list was doing what it was supposed to, and there was no possibility of deadlock. I disagree. It is true that I can fill up a remote FIFO with sends. In such a case

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Brian W. Barrett
On Wed, 24 Jun 2009, Eugene Loh wrote: Brian Barrett wrote: Or go to what I proposed and USE A LINKED LIST! (as I said before, not an original idea, but one I think has merit) Then you don't have to size the fifo, because there isn't a fifo. Limit the number of send fragments any one p

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Ralph Castain
I'm not sure the two questions in your second item are separable, Eugene. I fear that the only real solution will be to rearch the sm BTL, which was originally a flawed design. I think you did a great job of building on it, but we are now finding that the foundation is just too shaky, so no matter

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Eugene Loh
Brian Barrett wrote: Or go to what I proposed and USE A LINKED LIST! (as I said before, not an original idea, but one I think has merit) Then you don't have to size the fifo, because there isn't a fifo. Limit the number of send fragments any one proc can allocate and the only place memor

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Ralph Castain
rysigh. *From: *George Bosilca > *Date: *June 24, 2009 12:46:28 AM MDT > *To: *Open MPI Developers > *Subject: **Re: [OMPI devel] trac ticket 1944 and pending sends* > *Reply-To: *Open MPI Developers > > In other words, as long as a queue is peer based (peer not peers), the >

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread Brian Barrett
Or go to what I proposed and USE A LINKED LIST! (as I said before, not an original idea, but one I think has merit) Then you don't have to size the fifo, because there isn't a fifo. Limit the number of send fragments any one proc can allocate and the only place memory can grow without bo

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-24 Thread George Bosilca
In other words, as long as a queue is peer based (peer not peers), the management of the pending send list was doing what it was supposed to, and there was no possibility of deadlock. With the new code, as a third party can fill up a remote queue, getting a fragment back [as you stated] bec

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-23 Thread Eugene Loh
George Bosilca wrote: On Jun 23, 2009, at 11:04 , Eugene Loh wrote: The sm BTL used to have two mechanisms for dealing with congested FIFOs. One was to grow the FIFOs. Another was to queue pending sends locally (on the sender's side). I think the grow-FIFO mechanism was typically invok

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-23 Thread Brian W. Barrett
I think that sounds like a rational path forward. Another, more long term, option would be to move from the FIFOs to a linked list (which can even be atomic), which is what MPICH does with nemesis. In that case, there's never a queue to get backed up (although the receive queue for collective

Re: [OMPI devel] trac ticket 1944 and pending sends

2009-06-23 Thread George Bosilca
On Jun 23, 2009, at 11:04 , Eugene Loh wrote: The sm BTL used to have two mechanisms for dealing with congested FIFOs. One was to grow the FIFOs. Another was to queue pending sends locally (on the sender's side). I think the grow-FIFO mechanism was typically invoked and the pending-send

[OMPI devel] trac ticket 1944 and pending sends

2009-06-23 Thread Eugene Loh
The sm BTL used to have two mechanisms for dealing with congested FIFOs. One was to grow the FIFOs. Another was to queue pending sends locally (on the sender's side). I think the grow-FIFO mechanism was typically invoked and the pending-send mechanism used only under extreme circumstances (n