Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Ralph Castain
Yes, I know - but the problem comes from nidmap pushing data down into the opal_db/dstore level, which then creates a copy of the data. That's where the alignment error is generated On Aug 8, 2014, at 11:17 AM, George Bosilca wrote: > On Fri, Aug 8, 2014 at 5:21 AM, Ralph Castain wrote: > So

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread George Bosilca
On Fri, Aug 8, 2014 at 5:21 AM, Ralph Castain wrote: > Sorry to chime in a little late. George is likely correct about using > ORTE_NAME, only you can't do that as the OPAL layer has no idea what that > datatype looks like. This was the original reason for creating the > opal_identifier_t type -

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Ralph Castain
Committed a fix for this in r32459 - please check and see if this resolves the issue. On Aug 8, 2014, at 2:21 AM, Ralph Castain wrote: > Sorry to chime in a little late. George is likely correct about using > ORTE_NAME, only you can't do that as the OPAL layer has no idea what that > datatyp

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Ralph Castain
Sorry to chime in a little late. George is likely correct about using ORTE_NAME, only you can't do that as the OPAL layer has no idea what that datatype looks like. This was the original reason for creating the opal_identifier_t type - I had no other choice when we moved the db framework (now d

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
George, (one of the) faulty line was : if (ORTE_SUCCESS != (rc = opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_INTERNAL, OPAL_DB_LOCALLDR, (opal_identifier_t*)&proc, OPAL_ID_T))) { so if proc is not 64 bits aligned, a SIGBUS will occur on sparc. as you point

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread George Bosilca
This is a gigantic patch for an almost trivial issue. The current problem is purely related to the fact that in a single location (nidmap.c) the orte_process_name_t (which is a structure of 2 integers) is supposed to be aligned based on the uint64_t requirements. Bad assumption! Looking at the cod

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Kawashima, Takahiro
Gilles, I applied your patch to v1.8 and it run successfully on my SPARC machines. Takahiro Kawashima, MPI development team, Fujitsu > Kawashima-san and all, > > Here is attached a one off patch for v1.8. > /* it does not use the __attribute__ modifier that might not be > supported by all compi

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
Kawashima-san and all, Here is attached a one off patch for v1.8. /* it does not use the __attribute__ modifier that might not be supported by all compilers */ as far as i am concerned, the same issue is also in the trunk, and if you do not hit it, it just means you are lucky :-) the same issue

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Kawashima, Takahiro
Gilles, George, The problem is the one Gilles pointed. I temporarily modified the code bellow and the bus error disappeared. --- orte/util/nidmap.c (revision 32447) +++ orte/util/nidmap.c (working copy) @@ -885,7 +885,7 @@ orte_proc_state_t state; orte_app_idx_t app_idx; int32_t

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Kawashima, Takahiro
Hi George, > Takahiro you can confirm this by printing the value of data when signal is > raised. It's in the trace. 0x07fede74 #2 0x0282aff4 (store + 0x540) (uid=(unsigned long *) 0x0118a128,scope=8:'\b',key=(char *) 0x0106a0a8 "opal.local.ldr",data=(void *) 0x

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Gilles Gouaillardet
Kawashima-san, This is interesting :-) proc is in the stack and has type orte_process_name_t with typedef uint32_t orte_jobid_t; typedef uint32_t orte_vpid_t; struct orte_process_name_t { orte_jobid_t jobid; /**< Job number */ orte_vpid_t vpid; /**< Process id - equivalent to

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread George Bosilca
I have an extremely vague recollection about a similar issue in the datatype engine: on the SPARC architecture the 64 bits integers must be aligned on a 64bits boundary or you get a bus error. Takahiro you can confirm this by printing the value of data when signal is raised. George. On Fri, Au

Re: [OMPI devel] [OMPI users] bus error with openmpi-1.8.2rc2 on Solaris 10 Sparc

2014-08-08 Thread Kawashima, Takahiro
Hi, > > >>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris > > >>> 10 Sparc and I receive a bus error, if I run a small program. I've finally reproduced the bus error in my SPARC environment. #0 0x00db4740 (__waitpid_nocancel + 0x44) (0x200,0x0,0x0,0xa0,0xf80100064af0,0