[OMPI devel] Segfault in odls_fork_local_procs() for some values of npersocket

2011-11-08 Thread nadia.derbey
Hi, In v1.5, when mpirun is called with both the "-bind-to-core" and "-npersocket" options, and the npersocket value leads to less procs than sockets allocated on one node, we get a segfault Testing environment: openmpi v1.5 2 nodes with 4 8-cores sockets each mpirun -n 10 -bind-to-core -npersock

Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ashley Pittman
On 8 Nov 2011, at 00:59, George Bosilca wrote: > A started process is defined as being our mpirun. In Open MPI > MPIR_partial_attach_ok is defined, so the tool will suppose that we provide a > means to synchronize the processes not based on MPIR_debug_gate. Therefore > only one behavior if acc

Re: [OMPI devel] Segfault in odls_fork_local_procs() for some values of npersocket

2011-11-08 Thread Ralph Castain
Looks fine to me - CMR filed. Thanks! On Nov 8, 2011, at 1:01 AM, nadia.derbey wrote: > Hi, > > In v1.5, when mpirun is called with both the "-bind-to-core" and > "-npersocket" options, and the npersocket value leads to less procs than > sockets allocated on one node, we get a segfault > > Test

Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ralph Castain
On Nov 8, 2011, at 4:48 AM, Ashley Pittman wrote: > I agree that it's not clear this, I don't think this spec is well understood > by anyone, indeed it wasn't originally written with the intention of becoming > a specification at all. I've looked at it a couple of times but never used > this

Re: [OMPI devel] debugger confusion

2011-11-08 Thread Jeff Squyres
On Nov 7, 2011, at 8:34 PM, Ralph Castain wrote: > Best guess: from what I've seen, most debuggers don't seem to conform to what > the MPI Forum has "accepted". It doesn't appear that the vendors and debugger > developers pay too much attention to that document, possibly because it (a) > came a

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread Jeff Squyres
On Nov 7, 2011, at 9:48 PM, Nathan T. Hjelm wrote: > In retrospect I should have done a RFC for the 3rd change with a short > timeout. At the time (operating on little sleep) it seemed like the commits > would have minimal impact. Please let me know if the commits have any > negative impact. FWIW

[OMPI devel] debugger changes

2011-11-08 Thread Jeff Squyres
I think the only possible controversial change in this commit is changing MPIR_Breakpoint() to return (void) instead of (void*). Oddly, I see that MPICH2 has 2 different prototypes for MPIR_Breakpoint -- one returns (void*), another returns (int). Assuming that MPICH2 works fine with the debug

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread Nathan T. Hjelm
Sure, I can do that. My only concern is with sending between hosts of different endianness. For example, if seg_key is 128 bits wide and the key32 is 64 bits then we might run into this: Host 1: (big endian) Set seg_key.key32[0] = 0x would result in seg_key: 0x 0x 0x1

[OMPI devel] Remote key sizes

2011-11-08 Thread Rolf vandeVaart
> george. > >PS: Regarding the hand-copy instead of the memcpy, we tried to avoid using >memcpy in performance critical codes, especially when we know the size of >the data and the alignment. This relieves the compiler of adding ugly >intrinsics, >allowing it to nicely pipeline to load/stores. An

Re: [OMPI devel] debugger confusion

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 07:52 , Jeff Squyres wrote: > To be clear: that document simply standardizes what MPI implementations are > supposed to provide in their MPIR implementation (prior to this, MPI > implementations tended to have subtle differences between their MPIR > implementations, which we

Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ralph Castain
On Nov 8, 2011, at 8:25 AM, George Bosilca wrote: > > On Nov 8, 2011, at 07:52 , Jeff Squyres wrote: > >> To be clear: that document simply standardizes what MPI implementations are >> supposed to provide in their MPIR implementation (prior to this, MPI >> implementations tended to have subtl

Re: [OMPI devel] Remote key sizes

2011-11-08 Thread Nathan T. Hjelm
On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart wrote: >> george. >> >>PS: Regarding the hand-copy instead of the memcpy, we tried to avoid > using >>memcpy in performance critical codes, especially when we know the size of >>the data and the alignment. This relieves the compiler of adding u

Re: [OMPI devel] debugger confusion

2011-11-08 Thread Jeff Squyres
On Nov 8, 2011, at 10:25 AM, George Bosilca wrote: > However, based on what we have in the trunk today, Open MPI doesn't follow > that document. As Ralph pinpointed it, the current version work with several > tools (tv, stat, padb) as is, so that means the tools do not really follow > that docu

Re: [OMPI devel] debugger confusion

2011-11-08 Thread Ralph Castain
On Nov 8, 2011, at 8:37 AM, Jeff Squyres wrote: > On Nov 8, 2011, at 10:25 AM, George Bosilca wrote: > >> However, based on what we have in the trunk today, Open MPI doesn't follow >> that document. As Ralph pinpointed it, the current version work with several >> tools (tv, stat, padb) as is,

Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-11-08 Thread Larry Baker
The good news is that the issue reported in R25290 is fixed in the latest Intel compilers release (2011.7.256).  The bad news is that both the 2011.6.233 and 2011.7.256 releases identify themselves as V12.1.0 from the command line.  (I reported this bug to Intel already.)  They can only be reliably

[OMPI devel] Open MPI BOF

2011-11-08 Thread George Bosilca
Folks, Wednesday November 15th at 12:15 PST, we will have an Open MPI BOF. We will have two guest speakers: Rolf vandeVaart from NVIDIA and Shinji Sumimoto from the K-computer. If you are at SC, you are all invited to participate to this annual event. Blend for a moment with our user community,

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the startup process or the MPI application to signal changes to the debugger. No return value, nothing more than a breakpoint. I wonder how the volatile got there, there is no such requirement on variables that cannot be ch

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread George Bosilca
Elements in an array are always stored in the expected [increasing] order, regardless of the endianess of the architecture. Moreover, due to the alignment rules, all members in a union will start at the same address. It turns out there is no endianess conversion on the keys, so I suppose both p

Re: [OMPI devel] make check fails for Intel 2011.6.233 (OpenMPI 1.4.3)

2011-11-08 Thread George Bosilca
Larry, Thanks for following with us on this. I think your patch is cleaner than what we currently have in the trunk, so I went ahead and push it in the trunk (25461). I will request a push in 1.5 and 1.4 as well. Regards, george. On Nov 8, 2011, at 13:57 , Larry Baker wrote: > The good

Re: [OMPI devel] debugger changes

2011-11-08 Thread Ashley Pittman
I think the volatiles are there to ensure the compiler doesn't optimise away reads or function calls which has been a problem with this interface in the past. On 8 Nov 2011, at 22:18, George Bosilca wrote: > MPIR_Breakpoint, as the name indicates, it is just a breakpoint used by the > startup

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
I guess people should check the commit before … No way the volatile will do any good here: -ORTE_DECLSPEC extern volatile char MPIR_executable_path[MPIR_MAX_PATH_LENGTH]; -ORTE_DECLSPEC extern volatile char MPIR_server_arguments[MPIR_MAX_ARG_LENGTH]; +ORTE_DECLSPEC extern char MPIR_executable_path

Re: [OMPI devel] debugger changes

2011-11-08 Thread Paul H. Hargrove
In theory, might a sufficiently smart compiler and linker eliminate some MPIR_* variables after optimization? If that could potentially be true, then perhaps the volatile qualifier would prevent such a removal, which would break the existence check(s) by the debugger? Just a thought. -Paul

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread Nathan T. Hjelm
Ok, that makes sense. Is there a reason why the members were all set the be the same size? Maybe seg_key should be: union { uint8_t key8; uint16_t key16; uint32_t key32; uint64_t key64; struct { uint64_t value[2] } key128; }; -Nathan On Tue, 8 Nov 2011 17:22:48 -0500, George Bosilca

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 17:56 , Paul H. Hargrove wrote: > In theory, might a sufficiently smart compiler and linker eliminate some > MPIR_* variables after optimization? Even if a compiler can optimize out symbols from an application, I doubt they are allowed to apply the same optimization on librar

Re: [OMPI devel] Remote key sizes

2011-11-08 Thread Kenneth Lloyd
That makes sense to me. -Original Message- From: devel-boun...@open-mpi.org [mailto:devel-boun...@open-mpi.org] On Behalf Of Nathan T. Hjelm Sent: Tuesday, November 08, 2011 8:36 AM To: Open MPI Developers Subject: Re: [OMPI devel] Remote key sizes On Tue, 8 Nov 2011 06:36:03 -0800, Rol

Re: [OMPI devel] debugger changes

2011-11-08 Thread Ralph Castain
On Nov 8, 2011, at 3:56 PM, Paul H. Hargrove wrote: > In theory, might a sufficiently smart compiler and linker eliminate some > MPIR_* variables after optimization? If that could potentially be true, then > perhaps the volatile qualifier would prevent such a removal, which would > break the

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r25445

2011-11-08 Thread George Bosilca
I do not recall, and from the code there is no obvious reason. However, being able to store multiple smaller members might be a good enough reason. Btw, we don't use the key8 at all. I guess we can clean that code up to only keep key32 and key64, eventually with the count to match up the right s

Re: [OMPI devel] debugger changes

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 18:32 , Ralph Castain wrote: > That was the experience - after thrashing for quite some time, we finally > found that the volatile qualifiers fixed the problem. Hence my request that > people check to see if anything is broken. I will therefore propose to forever ban all com

Re: [OMPI devel] debugger changes

2011-11-08 Thread Paul H. Hargrove
Now this thread is starting to read like an episode of The Big Bang Theory. One possible guess as to how/why MPICH has managed w/o "volatile" would be that they may pass less aggressive optimization flags to the compilers. It is a then a question of which MPI implementation is supporting a cho

Re: [OMPI devel] Remote key sizes

2011-11-08 Thread George Bosilca
On Nov 8, 2011, at 10:36 , Nathan T. Hjelm wrote: > On Tue, 8 Nov 2011 06:36:03 -0800, Rolf vandeVaart > wrote: >>> george. >>> >>> PS: Regarding the hand-copy instead of the memcpy, we tried to avoid >> using >>> memcpy in performance critical codes, especially when we know the size of >>> the

Re: [OMPI devel] Remote key sizes

2011-11-08 Thread Barrett, Brian W
On 11/8/11 5:25 PM, "George Bosilca" wrote: >2. one sided: A quick look in the OSC seems to indicate there are some >special handling to be done in the RDMA one. Look at >ompi_osc_rdma_sendreq_t in osc_rdma_sendreq.h, it is using a trick to >store the remote segments. First, the mca_btl_base_segm