Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Ralph Castain
That's what we needed to know - i.e., that setting num_sockets=1 generates an error instead of segfaulting down the road. I can submit a CMR to do so. thx! On Feb 22, 2012, at 4:12 PM, Eugene Loh wrote: > On 02/22/12 14:54, Ralph Castain wrote: >> That doesn't really address the issue, though.

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Eugene Loh
On 02/22/12 14:54, Ralph Castain wrote: That doesn't really address the issue, though. What I want to know is: what happens when you try to bind processes? What about -bind-to-socket, and -persocket options? Etc. Reason I'm concerned: I'm not sure what happens if the socket layer isn't present.

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Brice Goglin
Le 22/02/2012 20:24, Eugene Loh a écrit : > On 2/22/2012 11:08 AM, Ralph Castain wrote: >> On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote: >>> Le 22/02/2012 17:48, Ralph Castain a écrit : On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote > On 2/21/2012 10:31 PM, Eugene Loh wrote: >> ...

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Ralph Castain
On Feb 22, 2012, at 12:24 PM, Eugene Loh wrote: > On 2/22/2012 11:08 AM, Ralph Castain wrote: >> On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote: >>> Le 22/02/2012 17:48, Ralph Castain a écrit : On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote > On 2/21/2012 10:31 PM, Eugene Loh wrote: >>>

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Eugene Loh
On 2/22/2012 11:08 AM, Ralph Castain wrote: On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote: Le 22/02/2012 17:48, Ralph Castain a écrit : On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote On 2/21/2012 10:31 PM, Eugene Loh wrote: ... "sockets" is unknown and hwloc returns 0 for num_sockets and O

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Ralph Castain
On Feb 22, 2012, at 11:59 AM, Brice Goglin wrote: > Le 22/02/2012 17:48, Ralph Castain a écrit : >> On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote: >> >>> On 2/21/2012 10:31 PM, Eugene Loh wrote: ... "sockets" is unknown and hwloc returns 0 for num_sockets and OMPI pukes on divide by

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Brice Goglin
Le 22/02/2012 17:48, Ralph Castain a écrit : > On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote: > >> On 2/21/2012 10:31 PM, Eugene Loh wrote: >>> ... "sockets" is unknown and hwloc returns 0 for num_sockets and OMPI >>> pukes on divide by zero. OS info was listed in the original message >>> (belo

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Ralph Castain
On Feb 22, 2012, at 9:39 AM, Eugene Loh wrote: > On 2/21/2012 10:31 PM, Eugene Loh wrote: >> ... "sockets" is unknown and hwloc returns 0 for num_sockets and OMPI pukes >> on divide by zero. OS info was listed in the original message (below). >> Might we want to do something else? E.g., ass

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Ralph Castain
Much simpler solution - on that platform, you should add "orte_num_sockets=1" to your default mca param file. Problem solved. It's why that param exists, and we added it specifically at Terry's request for an earlier, similar problem. On Feb 22, 2012, at 8:55 AM, Brice Goglin wrote: > Le 22/02

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Eugene Loh
On 2/21/2012 10:31 PM, Eugene Loh wrote: ... "sockets" is unknown and hwloc returns 0 for num_sockets and OMPI pukes on divide by zero. OS info was listed in the original message (below). Might we want to do something else? E.g., assume num_sockets==1 when num_sockets==0 (if you know what I

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Brice Goglin
Le 22/02/2012 07:36, Eugene Loh a écrit : > On 2/21/2012 5:40 PM, Paul H. Hargrove wrote: >> Here are the first of the results of the testing I promised. >> I am not 100% sure how to reach the code that Eugene reported as >> problematic, > I don't think you're going to see it. Somehow, hwloc on th

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Eugene Loh
On 2/21/2012 5:40 PM, Paul H. Hargrove wrote: Here are the first of the results of the testing I promised. I am not 100% sure how to reach the code that Eugene reported as problematic, I don't think you're going to see it. Somehow, hwloc on the config in question thinks there is no socket leve

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-22 Thread Eugene Loh
On 02/21/12 19:29, Jeffrey Squyres wrote: What's the output of running lstopo from hwloc 1.3.2? (this is the version that's in the OMPI trunk and v1.5 branches) http://www.open-mpi.org/software/hwloc/v1.3/ Is there any difference from v1.4 hwloc? http://www.open-mpi.org/software/hw

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove
My build with the "2011_sp1.8.273" Intel compilers passes the same tests as I detailed below for "2011_sp1.7.256". I don't suspect any longer that the compiler is at fault, but am willing to try additional/alternate tests to help confirm. -Paul On 2/21/2012 5:40 PM, Paul H. Hargrove wrote: He

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove
Here are the first of the results of the testing I promised. I am not 100% sure how to reach the code that Eugene reported as problematic, so I tried just running the ring test with various -bind-to-* options. I am quite willing to run additional test cases. All runs are w/ OMPI_MCA_btl=sm,s

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove
I have been testing v1.5 with slightly older Intel "composerxe-2011.5.220" compilers. I see a "make check" failure in opal_datatype_test which is not present with any other compiler (such as gcc on the same node). This has been seen most recently on the 1.5.5rc2r25990 tarball generated earlier t

Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Jeffrey Squyres
What's the output of running lstopo from hwloc 1.3.2? (this is the version that's in the OMPI trunk and v1.5 branches) http://www.open-mpi.org/software/hwloc/v1.3/ Is there any difference from v1.4 hwloc? http://www.open-mpi.org/software/hwloc/v1.4/ On Feb 21, 2012, at 7:20 PM, Eugen