Is it time to "svn rm ompi/mca/pml/dr"?
On Aug 4, 2009, at 6:50 AM, Ralph Castain wrote:
Rolf/Mouhamed
Could you get together off-list to discuss the different approaches
and see if/where there is common ground. It would be nice to see an
integrated solution - personally, I would rather not s
>From my perspective, the assumption that the low-level is reliable is
>completely
consistent with the assumptions that went into the ob1 design, so I don't see
changes
you may propose as a problem in principal.
Thanks a lot for the clarification,
Rich
On 8/3/09 9:39 AM, "Mouhamed Gueye" wr
Rolf/Mouhamed
Could you get together off-list to discuss the different approaches
and see if/where there is common ground. It would be nice to see an
integrated solution - personally, I would rather not see two
orthogonal approaches unless they can be cleanly separated. Much
better if the
I have not, but there should be no difference. The failover code only
gets triggered when an error happens. Otherwise, there are no
differences in the code paths while everything is functioning normally.
Sounds good. I still did not have time to review the code. I will try to
do it during t
I have not, but there should be no difference. The failover code only
gets triggered when an error happens. Otherwise, there are no
differences in the code paths while everything is functioning normally.
Rolf
On 08/03/09 11:14, Pavel Shamis (Pasha) wrote:
Rolf,
Did you compare latency/bw fo
On Sun, 2 Aug 2009, Ralph Castain wrote:
Perhaps a bigger question needs to be addressed - namely, does the ob1 code
need to be refactored?
Having been involved a little in the early discussion with bull when we
debated over where to put this, I know the primary concern was that the code
not
Rolf,
Did you compare latency/bw for failover-enabled code VS trunk ?
Pasha.
Rolf Vandevaart wrote:
Hi folks:
As some of you know, I have also been looking into implementing
failover as well. I took a different approach as I am solving the
problem within the openib BTL itself. This of cour
Hi folks:
As some of you know, I have also been looking into implementing failover
as well. I took a different approach as I am solving the problem within
the openib BTL itself. This of course means that this only works for
failing from one openib BTL to another but that was our area of
int
Hi list,
I'll try to answer to the main concerns so far.
We chose to work on ob1 for mainly 2 reasons:
- we focused first on fixing dr but were quite disappointed by its
performance in comparison with ob1. Then, we oriented our work on ob1 to
provide failover while keeping good performance.
Okay - here's a thought. Why not do what the original message asked?
Checkout their changes and look at what they did.
Then we can have the discussion about how intrusive it is. Otherwise,
all we're doing is debating what they -might- have done, or what
someone thinks they -should- have don
The point here is very different, and is not being made because of objections
for
fail-over support. Previous work took precisely this sort of approach, and in
that
particular case the desire to support reliability, but be able to compile out
this
support still had a negative performance imp
The objections being cited are somewhat unfair - perhaps people do not
understand the proposal being made? The developers have gone out of
their way to ensure that all changes are configured out unless you
specifically select to use that functionality. This has been our
policy from day one
On 8/2/09 12:55 AM, "Brian Barrett" wrote:
While I agree that performance impact (latency in this case) is
important, I disagree that this necessarily belongs somewhere other
than ob1. For example, a zero-performance impact solution would be to
provide two versions of all the interface functi
While I agree that performance impact (latency in this case) is
important, I disagree that this necessarily belongs somewhere other
than ob1. For example, a zero-performance impact solution would be to
provide two versions of all the interface functions, one with failover
turned on and one
What is the impact on sm, which is by far the most sensitive to latency. This
really belongs in a place other than ob1. Ob1 is supposed to provide the
lowest latency possible, and other pml's are supposed to be used for heavier
weight protocols.
On the technical side, how do you distinguish be
Hi list,
Here is an update on our work concerning device failover.
As many of you suggested, we reoriented our work on ob1 rather than dr
and we now have a working prototype on top of ob1. The approach is to
store btl descriptors sent to peers and delete them when we receive
proof of delivery
16 matches
Mail list logo