Re: [OMPI devel] SM init failures

2009-03-26 Thread Ralph Castain
You are correct - the Sun errors are in a version prior to the insertion of the SM changes. We didn't relabel the version to 1.3.2 until -after- those changes went in, so you have to look for anything with an r number >= 20839. The sif errors are all in that group - I would suggest starting

Re: [OMPI devel] SM init failures

2009-03-26 Thread Eugene Loh
Ralph Castain wrote: It looks like the SM revisions we inserted into 1.3.2 are a great detector for shared memory init failures - it segfaulted 143 times last night on IU's sif computer, 34 times on Sun/Linux, and 3 times on Sun/SunOS...almost every single time due to "Address not mapped"

Re: [OMPI devel] SM init failures

2009-03-26 Thread Eugene Loh
Ralph Castain wrote: Hi folks Er, perhaps pronounced "Eugene". :^( It looks like the SM revisions we inserted into 1.3.2 are a great detector for shared memory init failures How delicately put! I appreciate the gentleness. - it segfaulted 143 times last night on IU's sif computer, 34

Re: [OMPI devel] Infinite Loop: ompi_free_list_wait

2009-03-26 Thread Timothy Hayes
It it was just a few kinks actually. I think the the bitmap type moved from orte to opal, then I think the opal_hash_table functions changed slightly and also I think the modex stuff was called something like pml_modex where it's now ompi_modex. There were a few extra functions in the module descri

Re: [OMPI devel] Infinite Loop: ompi_free_list_wait

2009-03-26 Thread Lenny Verkhovsky
What is the error that you are getting from compilation failure? Lenny. On 3/23/09, Timothy Hayes wrote: > > That's a relief to know, although I'm still a bit concerned. I'm looking at > the code for the OpenMPI 1.3 trunk and in the ob1 component I can see the > following sequence: > > mca_pml_o

[OMPI devel] SM init failures

2009-03-26 Thread Ralph Castain
Hi folks It looks like the SM revisions we inserted into 1.3.2 are a great detector for shared memory init failures - it segfaulted 143 times last night on IU's sif computer, 34 times on Sun/Linux, and 3 times on Sun/SunOS...almost every single time due to "Address not mapped" errors in t