Hi, from reading the FAQ and this list it seems OpenMPI can use multiple InfiniBand rails by round-robining across the ports out of each node (as long as they're configured to be on separate subnets (I think)).
can OpenMPI also deal with one of the subnets failing? ie. will OpenMPI automatically fall back to using the last remaining working IB port out of a node, or even fallback to GigE if all the IB fails? the reason I ask is that we're worried about switches failing in the IB network and whether OpenMPI can solve all our problems for us if we configure up 2 or more independent IB networks out of each node. possibly this sort of failover in the MPI isn't needed with ConnectX as long as it's adaptive routing works as advertised? If so then I guess it's not that important, and I wouldn't want to make you guys do a lot of unecessary work :-) the FAQ entry here: http://www.open-mpi.org/faq/?category=ft#ft-future says - Data Reliability and network fault tolerance. Similar to those implemented in LA-MPI but I don't actually know what LA-MPI implemented in this area, so that doesn't really help me. cheers, robin
