As Terry described and based on the patch attached to the ticket on
trac, the extra goto has slipped in the commit by mistake. It belongs
to a totally different patch for shared memory I'm working on. I'll
remove it.
george.
On Jun 26, 2009, at 06:52 , Terry Dontje wrote:
Eugene Loh wr
Eugene Loh wrote:
Brian W. Barrett wrote:
All -
Jeff, Eugene, and I had a long discussion this morning on the sm BTL
flow management issues and came to a couple of conclusions.
* Jeff, Eugene, and I are all convinced that Eugene's addition of
polling the receive queue to drain acks when se
Brian W. Barrett wrote:
All -
Jeff, Eugene, and I had a long discussion this morning on the sm BTL
flow management issues and came to a couple of conclusions.
* Jeff, Eugene, and I are all convinced that Eugene's addition of
polling the receive queue to drain acks when sends start backing u
Brian W. Barrett wrote:
All -
Jeff, Eugene, and I had a long discussion this morning on the sm BTL
flow management issues and came to a couple of conclusions.
* Jeff, Eugene, and I are all convinced that Eugene's addition of
polling the receive queue to drain acks when sends start backing up
On Thu, 25 Jun 2009, Eugene Loh wrote:
I spoke with Brian and Jeff about this earlier today. Presumably, up through
1.2, mca_btl_component_progress would poll and if it received a message
fragment would return. Then, presumably in 1.3.0, behavior was changed to
keep polling until the FIFO wa
All -
Jeff, Eugene, and I had a long discussion this morning on the sm BTL flow
management issues and came to a couple of conclusions.
* Jeff, Eugene, and I are all convinced that Eugene's addition of polling
the receive queue to drain acks when sends start backing up is required
for deadloc
Eugene Loh wrote:
If you look in mca_btl_sm_component_progress, when a process receives
a message fragment and returns it to the sender, it executes code like
this:
goto recheck_peer;
break;
Okay, the reason I show you that code is because a static code checker
should easily ident
FWIW, Ralph and I have generally moved away from hosting hg's on www.open-mpi.org
-- we've been using bitbucket.org for hosting public and shared hg
repos. It's free to get an account. We love bitbucket.org! :-)
On Jun 25, 2009, at 10:23 AM, Eugene Loh wrote:
Might be fixed now.
Ralph
Might be fixed now.
Ralph Castain wrote:
Unfortunately, we cannot access this - permissions are denied. In
poking around, I found that your hg directory has permission 700.
Afraid you'll have to grant us permission to access this. :-/
On Jun 25, 2009, at 1:06 AM, Eugene Loh wrote:
Bryan
Unfortunately, we cannot access this - permissions are denied. In
poking around, I found that your hg directory has permission 700.
Afraid you'll have to grant us permission to access this. :-/
Ralph
On Jun 25, 2009, at 1:06 AM, Eugene Loh wrote:
Bryan Lally wrote:
Ralph Castain wrote:
Bryan Lally wrote:
Ralph Castain wrote:
Be happy to put it through the wringer... :-)
My wringer is available, too.
'kay. Try
hg clone ssh://www.open-mpi.org/~eloh/hg/pending_sends
which is r21498 but with changes to poll one's own FIFO more regularly
(e.g., even when just performing s
Ralph Castain wrote:
Be happy to put it through the wringer... :-)
My wringer is available, too.
- Bryan
--
Bryan Lally, la...@lanl.gov
505.667.9954
CCS-2
Los Alamos National Laboratory
Los Alamos, New Mexico
Not sure I can address that broader issue, but would you like us to
acid test your fix?
Be happy to put it through the wringer... :-)
Ralph
On Jun 23, 2009, at 7:40 PM, Eugene Loh wrote:
I have a fix for ticket 1944 working, but the broader problem is
unpleasant. E.g., let's say we have z
I have a fix for ticket 1944 working, but the broader problem is
unpleasant. E.g., let's say we have zillions of uncountered Bcasts or
something. Say, the root is repeatedly emitting sends, but never
polling its in-coming FIFO. Return fragments will be accumulating, the
FIFO will be congeste
14 matches
Mail list logo