George can speak more definitively about this.
In general, our "tuned" coll component (plugin) does exactly these
kinds of determinations to figure out which algorithm to use at
runtime. Not only are communicator process counts involved, but also
size of message is considered. I count 5 d
I am curious about the algorithm(s) used in the OpenMPI implementations
of the all2all and all2allv. As many of you know, there are alternate
algorithms for all2all type operations, such as that of Plimpton, et al
(2006), that basically exchange latency costs for bandwidth costs, which
pays big di