Re: [OMPI users] all2all algorithms

Jeff Squyres Tue, 14 Apr 2009 13:56:21 -0400

George can speak more definitively about this.

In general, our "tuned" coll component (plugin) does exactly thesekinds of determinations to figure out which algorithm to use atruntime. Not only are communicator process counts involved, but alsosize of message is considered. I count 5 different all2all algorithmsin our tuned module (but George will have to speak about how each oneis chosen). U. Tennessee published some papers on their method; theybasically hard-coded minimized selection tables based on oodles ofruns and empirical data.


If you'd like to look at the code, it's here in the tree:

    
https://svn.open-mpi.org/source/xref/ompi_1.3/ompi/mca/coll/tuned/coll_tuned_alltoall.c
    (v1.3 release branch)

    
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/coll/tuned/coll_tuned_alltoall.c
    (development trunk)

I don't think there's much difference between the two, but they candrift since v1.3 is the release branch and the trunk is activedevelopment.



On Apr 12, 2009, at 3:20 PM, Tom Rosmond wrote:

I am curious about the algorithm(s) used in the OpenMPIimplementations
of the all2all and all2allv.  As many of you know, there are alternate
algorithms for all2all type operations, such as that of Plimpton, etal(2006), that basically exchange latency costs for bandwidth costs,which
pays big dividends for large processor numbers, e.g. 100's or 1000's.
Does OpenMPI, or any other MPI distributions, test for processor count
and switch to such an all2all algorithm at some point?  I realize the
switchover point would be very much a function of the architecture,and
so could be a risky decision in some cases.  Nevertheless, has it been
considered?



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] all2all algorithms

Reply via email to