Re: [OMPI users] Heterogeneous OpenFabrics hardware

Jeff Squyres Tue, 27 Jan 2009 08:18:36 -0500

It is worth clarifying a point in this discussion that I neglected tomention in my initial post: although Open MPI may not work *bydefault* with heterogeneous HCAs/RNICs, it is quite possible/likelythat if you manually configure Open MPI to use the same verbs/hardwaresettings across all your HCAs/RNICs (assuming that you use a set ofvalues that is compatible with all your hardware) that MPI jobsspanning multiple different kinds of HCAs or RNICs will work fine.


See this post on the devel list for a few more details:


    http://www.open-mpi.org/community/lists/devel/2009/01/5314.php



On Jan 27, 2009, at 6:08 AM, Peter Kjellstrom wrote:

On Monday 26 January 2009, Jeff Squyres wrote:
The Interop Working Group (IWG) of the OpenFabrics Alliance asked me
to bring a question to the Open MPI user and developer communities:is
anyone interested in having a single MPI job span HCAs or RNICs from
multiple vendors?  (pardon the cross-posting, but I did want to ask
each group separately -- because the answers may be different)

The interop testing lab at the University of New Hampshire
(http://www.iol.unh.edu/services/testing/ofa/ ) discovered thatmost (all?)
MPI implementations fail when having a single MPI job span HCAs from
multiple vendors and/or span RNICs from multiple vendors. I don'trememberthe exact details (and they may not be public, anyway), but I'mpretty surethat OMPI failed when used with QLogic and Mellanox HCAs in asingle MPIjob. This is fairly unsurprising, given how we tune Open MPI's useof
OpenFabrics-capable hardware based on our .ini file.

So my question is: does anyone want/need to support jobs that span
HCAs from multiple vendors and/or RNICs from multiple vendors?
For these three cases:

1) Different vedor id but same OFED driver and basic chip
2) Same chip vendor, different OFED driver (mthca vs mlx4)
3) Any OFED supported IB HCA

IMHO:
Number one should just work. We may at times have some nodes withHCAs that
have been flashed with non-standard/non-vendor firmware.
Number two is something I would kind of expect to work. A possiblesituationwhere I'd need it is if I temporarily use an older HCA (mthca) toget a nodegoing on a cluster with ConnectX (mlx4). Another case could be acluster with
two partitions with different HCAs.
Number three would be nice to have. I think many users would assumeit towork. Why not? They have symmetric software, all nodes run OFED, allhaveworking IB... It would have worked if their nodes had had differentkinds of
ethernet NICS...

/Peter
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] Heterogeneous OpenFabrics hardware

Reply via email to