Okay, some things I am already discovering. If you do an "svn up", there is
some file cleanup you'll need to do to get this to build again. Specifically,
you need to:
rm config/mca_m4_config_include.m4
as this is a stale file that will linger and screw things up.
On Sep 17, 2010, at 4:57 PM,
After chatting with Jeff, we decided it would be good to introduce this into
the trunk over the weekend so it can settle before people started beating on
it. Please note:
WARNING: Work on the temp branch being merged here encountered problems with
bugs in subversion. Considerable effort has gon
I'll look into Solaris Studio. I think somehow the connections are
getting single threaded or somehow funneled due to the gather
algorithm. And since they are taking ~160ms to setup each one, and
there are ~3600 connections getting setup, we end up with a 7 minute run
time. Now, 160ms seem
Right, by default all connections will be handled on the fly. So as an
MPI_Send is executed to a process that there is not a connection to then
a dance happens between the sender and the receiver. So why this
happens with np > 60 may have to do with how many connections are
happening at the s
Does anyone have a NP64 IB cluster handy? I'd be interested if IB
behaves this way when running with the rdmacm connect method. IE with:
--mca btl_openib_cpc_include rdmacm --mca btl openib,sm,self
Steve.
On 9/17/2010 10:41 AM, Steve Wise wrote:
Yes it does. With mpi_preconnect_mpi to 1
Yes it does. With mpi_preconnect_mpi to 1, NP64 doesn't stall. So
its not the algorithm in and of itself, but rather some interplay
between the algorithm and connection setup I guess.
On 9/17/2010 5:24 AM, Terry Dontje wrote:
Does setting mca parameter mpi_preconnect_mpi to 1 help at all.
I downloaded the nightly build of the trunk (r23756) and found that the
checkpoint functionality is broken. My MPI program is a simple helloworld
program incrementing and printing the number every few seconds once.
Following are the steps:
1. mpirun with NP set to 32
2. call ompi-checkpoint with
Hi all,
In charge of ticket 1888 (see at
https://svn.open-mpi.org/trac/ompi/ticket/1888) ,
I have put the resulting code in bitbucket at:
http://bitbucket.org/devezep/new-romio-for-openmpi/
The work in this repo consisted in refreshing ROMIO to a newer
version: the one from the very last MPICH
Does setting mca parameter mpi_preconnect_mpi to 1 help at all. This
might be able to help determine if it is the actually connection set up
between processes that are out of sync as oppose to something in the
actual gather algorithm.
--td
Steve Wise wrote:
Here's a clue: ompi_coll_tuned_ga