Re: [OMPI devel] Autogen improvements: ready for blast off

2010-09-17 Thread Ralph Castain
Okay, some things I am already discovering. If you do an "svn up", there is some file cleanup you'll need to do to get this to build again. Specifically, you need to: rm config/mca_m4_config_include.m4 as this is a stale file that will linger and screw things up. On Sep 17, 2010, at 4:57 PM,

Re: [OMPI devel] Autogen improvements: ready for blast off

2010-09-17 Thread Ralph Castain
After chatting with Jeff, we decided it would be good to introduce this into the trunk over the weekend so it can settle before people started beating on it. Please note: WARNING: Work on the temp branch being merged here encountered problems with bugs in subversion. Considerable effort has gon

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
I'll look into Solaris Studio. I think somehow the connections are getting single threaded or somehow funneled due to the gather algorithm. And since they are taking ~160ms to setup each one, and there are ~3600 connections getting setup, we end up with a 7 minute run time. Now, 160ms seem

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Terry Dontje
Right, by default all connections will be handled on the fly. So as an MPI_Send is executed to a process that there is not a connection to then a dance happens between the sender and the receiver. So why this happens with np > 60 may have to do with how many connections are happening at the s

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
Does anyone have a NP64 IB cluster handy? I'd be interested if IB behaves this way when running with the rdmacm connect method. IE with: --mca btl_openib_cpc_include rdmacm --mca btl openib,sm,self Steve. On 9/17/2010 10:41 AM, Steve Wise wrote: Yes it does. With mpi_preconnect_mpi to 1

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Steve Wise
Yes it does. With mpi_preconnect_mpi to 1, NP64 doesn't stall. So its not the algorithm in and of itself, but rather some interplay between the algorithm and connection setup I guess. On 9/17/2010 5:24 AM, Terry Dontje wrote: Does setting mca parameter mpi_preconnect_mpi to 1 help at all.

[OMPI devel] Checkpoint is broken in trunk

2010-09-17 Thread ananda.mudar
I downloaded the nightly build of the trunk (r23756) and found that the checkpoint functionality is broken. My MPI program is a simple helloworld program incrementing and printing the number every few seconds once. Following are the steps: 1. mpirun with NP set to 32 2. call ompi-checkpoint with

[OMPI devel] New Romio for OpenMPI available in bitbucket

2010-09-17 Thread Pascal Deveze
Hi all, In charge of ticket 1888 (see at https://svn.open-mpi.org/trac/ompi/ticket/1888) , I have put the resulting code in bitbucket at: http://bitbucket.org/devezep/new-romio-for-openmpi/ The work in this repo consisted in refreshing ROMIO to a newer version: the one from the very last MPICH

Re: [OMPI devel] NP64 _gather_ problem

2010-09-17 Thread Terry Dontje
Does setting mca parameter mpi_preconnect_mpi to 1 help at all. This might be able to help determine if it is the actually connection set up between processes that are out of sync as oppose to something in the actual gather algorithm. --td Steve Wise wrote: Here's a clue: ompi_coll_tuned_ga