Re: [OMPI devel] Fwd: [ROMIO Req #947] New version of ROMIO?

2009-04-30 Thread Jeff Squyres
On Apr 30, 2009, at 12:33 PM, Ralph Castain wrote: Do we really want to go with a release candidate instead of an official release? That sounds pretty risky to me... Probably so. Is a new ROMIO worth a 1.3.4 or pushing 1.3.3 for a while? (I don't know the release schedule -- I suspect it'

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
On Thu, 30 Apr 2009, Ralph Castain wrote: well, that's only because the code's doing something it shouldn't.  Have a look at comm_cid.c:185 - there's the check we added to the multi-threaded case (which was the only case when we added it).  The cid generation should never generate a number larger

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Ralph Castain
I'll file a ticket against itoh joy!!! You all know how much I *love* tickets! On Thu, Apr 30, 2009 at 1:11 PM, Ralph Castain wrote: > > On Thu, Apr 30, 2009 at 12:55 PM, Brian W. Barrett > wrote: > >> On Thu, 30 Apr 2009, Edgar Gabriel wrote: >> >> Brian W. Barrett wrote: >>> When w

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Ralph Castain
On Thu, Apr 30, 2009 at 12:55 PM, Brian W. Barrett wrote: > On Thu, 30 Apr 2009, Edgar Gabriel wrote: > > Brian W. Barrett wrote: >> >>> When we added the CM PML, we added a pml_max_contextid field to the PML >>> structure, which is the max size cid the PML can handle (because the >>> matching in

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Edgar Gabriel
so I agree that we need to fix that, and we'll get a fix for that as soon as possible. It still strikes me as wrong however to we have fundamentally different types on two layers for the same 'item'. I still think that going back to the original algorithm would be bad - especially for an appli

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread David Gunter
Here is the test code reproducer: program test2 implicit none include 'mpif.h' integer ierr, myid, numprocs,i1,i2,n,local_comm, $ icolor,ikey,rank,root c c... MPI set-up ierr = 0 call MPI_INIT(IERR) ierr = 1 CALL MPI_COMM_SIZE(MPI_COMM_WO

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
On Thu, 30 Apr 2009, Edgar Gabriel wrote: Brian W. Barrett wrote: When we added the CM PML, we added a pml_max_contextid field to the PML structure, which is the max size cid the PML can handle (because the matching interfaces don't allow 32 bits to be used for the cid. At the same time, the

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Edgar Gabriel
Brian W. Barrett wrote: When we added the CM PML, we added a pml_max_contextid field to the PML structure, which is the max size cid the PML can handle (because the matching interfaces don't allow 32 bits to be used for the cid. At the same time, the max cid for OB1 was shrunk significantly, s

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Ralph Castain
As an FYI the code runs just fine using OMPI 1.2.x - it is only 1.3.x where the problem arises. So it is definitely something that changed in the 1.3 series O On Thu, Apr 30, 2009 at 12:36 PM, Brian W. Barrett wrote: > When we added the CM PML, we added a pml_max_contextid field to the PML > s

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread David Gunter
Just to throw out more info on this, the test code runs fine on previous versions of OMPI. It only hangs on the 1.3 line when the cid reaches 65536. -david -- David Gunter HPC-3: Parallel Tools Team Los Alamos National Laboratory On Apr 30, 2009, at 12:28 PM, Edgar Gabriel wrote: cid's a

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
When we added the CM PML, we added a pml_max_contextid field to the PML structure, which is the max size cid the PML can handle (because the matching interfaces don't allow 32 bits to be used for the cid. At the same time, the max cid for OB1 was shrunk significantly, so that the header on a s

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Edgar Gabriel
cid's are in fact not recycled in the block algorithm. The problem is that comm_free is not collective, so you can not make any assumptions whether other procs have also released that communicator. But nevertheless, a cid in the communicator structure is a uint32_t, so it should not hit the 1

Re: [OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Brian W. Barrett
On Thu, 30 Apr 2009, Ralph Castain wrote: We seem to have hit a problem here - it looks like we are seeing a built-in limit on the number of communicators one can create in a program. The program basically does a loop, calling MPI_Comm_split each time through the loop to create a sub-communicato

[OMPI devel] Inherent limit on #communicators?

2009-04-30 Thread Ralph Castain
Hi folks We seem to have hit a problem here - it looks like we are seeing a built-in limit on the number of communicators one can create in a program. The program basically does a loop, calling MPI_Comm_split each time through the loop to create a sub-communicator, does a reduce operation on the m

Re: [OMPI devel] Fwd: [ROMIO Req #947] New version of ROMIO?

2009-04-30 Thread Ralph Castain
Do we really want to go with a release candidate instead of an official release? That sounds pretty risky to me... On Thu, Apr 30, 2009 at 10:04 AM, Jeff Squyres wrote: > How long do we want to wait for 1.3.3? It looks like they're have a new RC > for ROMIO on May 15th -- can we wait that long?

[OMPI devel] Fwd: [ROMIO Req #947] New version of ROMIO?

2009-04-30 Thread Jeff Squyres
How long do we want to wait for 1.3.3? It looks like they're have a new RC for ROMIO on May 15th -- can we wait that long? At a minimum, it'll take a day or three to integrate the new ROMIO into OMPI. Begin forwarded message: From: Rob Latham Date: April 30, 2009 12:00:22 PM EDT To: Jef

Re: [OMPI devel] vampirtrace on v1.3 branch

2009-04-30 Thread Terry Dontje
Andreas Knüpfer wrote: On Tuesday 28 April 2009, Terry Dontje wrote: Has anyone tested running a simple program compiled with mpicc-vt that was built on RHEL 5.1 or SLES-10 with gcc under 32 bits? I am seeing the following errors when running compiled code: VampirTrace: BFD: bfd_get_file_f

Re: [OMPI devel] predefined ompi_t types break strict-aliasing rules

2009-04-30 Thread Jeff Squyres
On Apr 30, 2009, at 8:07 AM, Number Cruncher wrote: Following the discussion about ABI compatibility and type-punning of non client-visible types, I've attached a patch against 1.3.2 which casts to an opaque (void *) when OMPI_BUILDING is 0. This will prevent the compiler from trying to do

Re: [OMPI devel] vampirtrace on v1.3 branch

2009-04-30 Thread Andreas Knüpfer
On Tuesday 28 April 2009, Terry Dontje wrote: > Has anyone tested running a simple program compiled with mpicc-vt that > was built on RHEL 5.1 or SLES-10 with gcc under 32 bits? > > I am seeing the following errors when running compiled code: > VampirTrace: BFD: bfd_get_file_flags(): failed > >

Re: [OMPI devel] predefined ompi_t types break strict-aliasing rules

2009-04-30 Thread Number Cruncher
Following the discussion about ABI compatibility and type-punning of non client-visible types, I've attached a patch against 1.3.2 which casts to an opaque (void *) when OMPI_BUILDING is 0. This will prevent the compiler from trying to do any strict-aliasing based optimizations when the defini

Re: [OMPI devel] Fwd: Purify found bugs inside open-mpi library

2009-04-30 Thread Terry Dontje
Jeff Squyres wrote: On Apr 29, 2009, at 5:03 PM, Brian Blank wrote: Purify did find some other UMR (unitialize memory read) errors though, but they don't seem to be negativley impacting my application right now. Nonetheless, I'll post them later today in case anyone is interested in them.