On Sat, 2008-08-16 at 08:03 -0400, Jeff Squyres wrote: > - large all to all operations are very stressful on the network, even > if you have very low latency / high bandwidth networking such as DDR IB > > - if you only have 1 IB HCA in a machine with 8 cores, the problem > becomes even more difficult because all 8 of your MPI processes will > be hammering the HCA with read and write requests; it's a simple I/O > resource contention issue
That alone doesn't explain the sudden jump (drop) in performance figures. > - there are several different algorithms in Open MPI for performing > alltoall, but they were not tuned for ppn>4 (honestly, they were tuned > for ppn=1, but they still usually work "well enough" for ppn<=4). In > Open MPI v1.3, we introduce the "hierarch" collective module, which > should greatly help with ppn>4 kinds of scenarios for collectives > (including, at least to some degree, all to all) Is there a way to tell or influence which algorithm is used in the current case? Looking through the code I can see several but cannot see how to tune the thresholds. Ashley.