On Apr 1, 2010, at 4:17 PM, Oliver Geisler wrote:

> > However, reading through your initial description on Tuesday, none of these
> > fit: You want to actually measure the kernel time on TCP communication 
> > costs.
> 
> Since the problem occurs also on node only configuration and mca-option
> btl = self,sm,tcp is used, I doubt it has to do with TCP communication.

I'm not sure what to make of this remark.  Why would the raw performance of TCP 
be irrelevant?  Open MPI uses TCP over ethernet, so it can't be faster than 
TCP.  More specifically: if something is making TCP slow, the MPI will be slow 
as well.  From the times you've listed, it almost sounds like you're getting a 
lot of TCP drops and retransmits (the lengthy times could be timeouts).  Can 
you check your NIC / switch hardware to see if you're getting drops?

Also, you should probably test raw ping-pong performance:

a) between 2 MPI processes on the same node.  E.g.:

   mpirun -np 2 --mca btl sm,self _your_favorite_benchmark_

   This will test shared memory latency/bandwidth/whatever of MPI on that node.

b) between 2 MPI processes on different nodes

   mpirun -np 2 --host cluster-06,cluster-07 --mca btl tcp,self 
_your_favorite_benchmark_

   This will test TCP latency/bandwidth/whatever of MPI on that node.

Try NetPIPE -- it has both MPI communication benchmarking and TCP benchmarking. 
 Then you can see if there is a noticable difference between TCP and MPI (there 
shouldn't be).  There's also a "memcpy" mode in netpipe, but it's not quite the 
same thing as shared memory message passing.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to