Hi,

I can think of a few scenarios where interoperability would be helpful,
but I guess in most case you can live without.

1. Some university departments buy tiny clusters (4-8 nodes) and when more projects/funding become available the next one. Thus ending up with 2-4 different CPU generations or steppings and probably different HCA version. If your MPI program does load balancing you probably don't case about slightly different CPU speeds and you are glad if you can use all machines.

2. You operate a medium to large size cluster (300 nodes +) and after e.g. a year few HCAs might break and you have to replace them. I can imagine that it is hard to get an HCA with exactly the same chipset. If you end up with a few nodes that can't run MPI programs with the rest that would be unfortunate.

best regards,
Samuel

Don Kerr wrote:
Jeff,

Did IWG say anything about there being a chip set issue? Example what if a vender, say Sun, wraps Mellanox chips and on its own HCAs, would Mellanox HCA and Sun HCA work together?

-DON

On 01/26/09 14:19, Jeff Squyres wrote:
The Interop Working Group (IWG) of the OpenFabrics Alliance asked me to bring a question to the Open MPI user and developer communities: is anyone interested in having a single MPI job span HCAs or RNICs from multiple vendors? (pardon the cross-posting, but I did want to ask each group separately -- because the answers may be different)

The interop testing lab at the University of New Hampshire (http://www.iol.unh.edu/services/testing/ofa/) discovered that most (all?) MPI implementations fail when having a single MPI job span HCAs from multiple vendors and/or span RNICs from multiple vendors. I don't remember the exact details (and they may not be public, anyway), but I'm pretty sure that OMPI failed when used with QLogic and Mellanox HCAs in a single MPI job. This is fairly unsurprising, given how we tune Open MPI's use of OpenFabrics-capable hardware based on our .ini file.

So my question is: does anyone want/need to support jobs that span HCAs from multiple vendors and/or RNICs from multiple vendors?

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to