Re: [OMPI devel] MTT tests: segv's with sm on large messages

2009-05-06 Thread Josh Hursey
If it would help in tracking this problem to give someone access to Sif, I can probably make that happen. Just let me know. Cheers, Josh On May 5, 2009, at 8:08 PM, Eugene Loh wrote: Jeff Squyres wrote: On May 5, 2009, at 6:01 PM, Eugene Loh wrote: You and Terry saw something that was occ

Re: [OMPI devel] MTT tests: segv's with sm on large messages

2009-05-05 Thread Eugene Loh
Jeff Squyres wrote: On May 5, 2009, at 6:01 PM, Eugene Loh wrote: You and Terry saw something that was occurring about 0.01% of the time during MPI_Init during add_procs. That does not seem to be what we are seeing here. Right -- that's what I'm saying. It's different than the MPI_INIT

Re: [OMPI devel] MTT tests: segv's with sm on large messages

2009-05-05 Thread Jeff Squyres
On May 5, 2009, at 6:01 PM, Eugene Loh wrote: You and Terry saw something that was occurring about 0.01% of the time during MPI_Init during add_procs. That does not seem to be what we are seeing here. Right -- that's what I'm saying. It's different than the MPI_INIT errors. But we h

Re: [OMPI devel] MTT tests: segv's with sm on large messages

2009-05-05 Thread Eugene Loh
Different from what? You and Terry saw something that was occurring about 0.01% of the time during MPI_Init during add_procs. That does not seem to be what we are seeing here. But we have seen failures in 1.3.1 and 1.3.2 that look like the one here. They occur more like 1% of the time and

Re: [OMPI devel] MTT tests: segv's with sm on large messages

2009-05-05 Thread Jeff Squyres
Hmm -- this looks like a different error to me. The <1% error rate sm error we were seeing was in MPI_INIT. This looks like it is beyond MPI_INIT and in the sending path...? On May 4, 2009, at 11:00 AM, Eugene Loh wrote: Ralph Castain wrote: > In reviewing last night's MTT tests for the 1

Re: [OMPI devel] MTT tests: segv's with sm on large messages

2009-05-04 Thread Eugene Loh
Ralph Castain wrote: In reviewing last night's MTT tests for the 1.3 branch, I am seeing several segfault failures in the shared memory BTL when using large messages. This occurred on both IU's sif machine and on Sun's tests. Here is a typical stack from MTT: MPITEST info (0): Starting MP

[OMPI devel] MTT tests: segv's with sm on large messages

2009-05-04 Thread Ralph Castain
Hi folks In reviewing last night's MTT tests for the 1.3 branch, I am seeing several segfault failures in the shared memory BTL when using large messages. This occurred on both IU's sif machine and on Sun's tests. Here is a typical stack from MTT: MPITEST info (0): Starting MPI_Sendrecv: