On Sat, 2008-08-16 at 08:03 -0400, Jeff Squyres wrote:
> - large all to all operations are very stressful on the network, even  
> if you have very low latency / high bandwidth networking such as DDR IB
> 
> - if you only have 1 IB HCA in a machine with 8 cores, the problem  
> becomes even more difficult because all 8 of your MPI processes will  
> be hammering the HCA with read and write requests; it's a simple I/O  
> resource contention issue

That alone doesn't explain the sudden jump (drop) in performance
figures.

> - there are several different algorithms in Open MPI for performing  
> alltoall, but they were not tuned for ppn>4 (honestly, they were tuned  
> for ppn=1, but they still usually work "well enough" for ppn<=4).  In  
> Open MPI v1.3, we introduce the "hierarch" collective module, which  
> should greatly help with ppn>4 kinds of scenarios for collectives  
> (including, at least to some degree, all to all)

Is there a way to tell or influence which algorithm is used in the
current case?  Looking through the code I can see several but cannot see
how to tune the thresholds.

Ashley.

Reply via email to