On May 27, 2010, at 10:32 AM, Sylvain Jeaugey wrote:
> That's pretty much my first proposition : abort when an error arises,
> because if we don't, we'll crash soon afterwards. That's my original
> concern and this should really be fixed.
>
> Now, if you want to fix the openib BTL so that an erro
That's pretty much my first proposition : abort when an error arises,
because if we don't, we'll crash soon afterwards. That's my original
concern and this should really be fixed.
Now, if you want to fix the openib BTL so that an error in IB results in
an elegant fallback on TCP (elegant = not
Sylvain -
I have to agree with Ralph. The situation you describe (IB failing) may or may
not be what the user wants. And, in fact, we will print a warning message to
the user that such a situation (falling back to TCP) has occurred. However, it
also does not fall under the category of "fail
On May 27, 2010, at 1:47 AM, Sylvain Jeaugey wrote:
> I don't think what the openib BTL is doing is that bad. It is returning an
> error because something really went bad in IB. So yes, it could blank the
> bitmask and return success, but would you really want IB to fail and fallback
> on TCP
I don't think what the openib BTL is doing is that bad. It is returning an
error because something really went bad in IB. So yes, it could blank the
bitmask and return success, but would you really want IB to fail and
fallback on TCP once in a while without any notice ? I wouldn't.
So, as it s