Re: [OMPI devel] NP64 _gather_ problem

2010-09-20 Thread Steve Wise
Just an update for folks: The connection setup latency was a bug in my iw_cxgb3 rdma driver. It wasn't turning off RX coalescing for the iwarp connections. This resulted in 100-200ms added latency since the iwarp connection setup uses TCP streaming mode messages to negotiate the iwarp connec

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
I'll look into Solaris Studio. I think somehow the connections are getting single threaded or somehow funneled due to the gather algorithm. And since they are taking ~160ms to setup each one, and there are ~3600 connections getting setup, we end up with a 7 minute run time. Now, 160ms seem

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Terry Dontje
Right, by default all connections will be handled on the fly. So as an MPI_Send is executed to a process that there is not a connection to then a dance happens between the sender and the receiver. So why this happens with np > 60 may have to do with how many connections are happening at the s

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
Does anyone have a NP64 IB cluster handy? I'd be interested if IB behaves this way when running with the rdmacm connect method. IE with: --mca btl_openib_cpc_include rdmacm --mca btl openib,sm,self Steve. On 9/17/2010 10:41 AM, Steve Wise wrote: Yes it does. With mpi_preconnect_mpi to 1

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
Yes it does. With mpi_preconnect_mpi to 1, NP64 doesn't stall. So its not the algorithm in and of itself, but rather some interplay between the algorithm and connection setup I guess. On 9/17/2010 5:24 AM, Terry Dontje wrote: Does setting mca parameter mpi_preconnect_mpi to 1 help at all.

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Terry Dontje
Does setting mca parameter mpi_preconnect_mpi to 1 help at all. This might be able to help determine if it is the actually connection set up between processes that are out of sync as oppose to something in the actual gather algorithm. --td Steve Wise wrote: Here's a clue: ompi_coll_tuned_ga

Re: [OMPI devel] NP64 _gather_ problem

2010-09-16 Thread Steve Wise
Here's a clue: ompi_coll_tuned_gather_intra_dec_fixed() changes its algorithm for job sizes > 60 to some binomial method. I changed the threshold to 100 and my NP64 jobs run fine. Now to try and understand what about ompi_coll_tuned_gather_intra_binomial() is causing these connect delays...