At the risk of banging on too much about collectives: I came across a writeup of the "SLOAVx" algorithm for alltoallv <http://www.auburn.edu/~zzl0014/pubs/ccgrid13.pdf>. It was implemented in OMPI with apparently good results, but I can't find any code.
I wonder if anyone knows the story on that. Was it not contributed, or is it actually not worthwhile? Otherwise, might it be worth investigating?