Re: [OMPI devel] BTL add procs errors

2010-05-27 Thread Jeff Squyres
On May 27, 2010, at 10:32 AM, Sylvain Jeaugey wrote: > That's pretty much my first proposition : abort when an error arises, > because if we don't, we'll crash soon afterwards. That's my original > concern and this should really be fixed. > > Now, if you want to fix the openib BTL so that an erro

Re: [OMPI devel] BTL add procs errors

2010-05-27 Thread Sylvain Jeaugey
That's pretty much my first proposition : abort when an error arises, because if we don't, we'll crash soon afterwards. That's my original concern and this should really be fixed. Now, if you want to fix the openib BTL so that an error in IB results in an elegant fallback on TCP (elegant = not

Re: [OMPI devel] BTL add procs errors

2010-05-27 Thread Barrett, Brian W
Sylvain - I have to agree with Ralph. The situation you describe (IB failing) may or may not be what the user wants. And, in fact, we will print a warning message to the user that such a situation (falling back to TCP) has occurred. However, it also does not fall under the category of "fail

Re: [OMPI devel] BTL add procs errors

2010-05-27 Thread Ralph Castain
On May 27, 2010, at 1:47 AM, Sylvain Jeaugey wrote: > I don't think what the openib BTL is doing is that bad. It is returning an > error because something really went bad in IB. So yes, it could blank the > bitmask and return success, but would you really want IB to fail and fallback > on TCP

Re: [OMPI devel] BTL add procs errors

2010-05-27 Thread Sylvain Jeaugey
I don't think what the openib BTL is doing is that bad. It is returning an error because something really went bad in IB. So yes, it could blank the bitmask and return success, but would you really want IB to fail and fallback on TCP once in a while without any notice ? I wouldn't. So, as it s