Yes, I know - but the problem comes from nidmap pushing data down into the
opal_db/dstore level, which then creates a copy of the data. That's where the
alignment error is generated
On Aug 8, 2014, at 11:17 AM, George Bosilca wrote:
> On Fri, Aug 8, 2014 at 5:21 AM, Ralph Castain wrote:
> So
On Fri, Aug 8, 2014 at 5:21 AM, Ralph Castain wrote:
> Sorry to chime in a little late. George is likely correct about using
> ORTE_NAME, only you can't do that as the OPAL layer has no idea what that
> datatype looks like. This was the original reason for creating the
> opal_identifier_t type -
Committed a fix for this in r32459 - please check and see if this resolves the
issue.
On Aug 8, 2014, at 2:21 AM, Ralph Castain wrote:
> Sorry to chime in a little late. George is likely correct about using
> ORTE_NAME, only you can't do that as the OPAL layer has no idea what that
> datatyp
Sorry to chime in a little late. George is likely correct about using
ORTE_NAME, only you can't do that as the OPAL layer has no idea what that
datatype looks like. This was the original reason for creating the
opal_identifier_t type - I had no other choice when we moved the db framework
(now d
George,
(one of the) faulty line was :
if (ORTE_SUCCESS != (rc =
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_INTERNAL,
OPAL_DB_LOCALLDR, (opal_identifier_t*)&proc, OPAL_ID_T))) {
so if proc is not 64 bits aligned, a SIGBUS will occur on sparc.
as you point
This is a gigantic patch for an almost trivial issue. The current problem
is purely related to the fact that in a single location (nidmap.c) the
orte_process_name_t (which is a structure of 2 integers) is supposed to be
aligned based on the uint64_t requirements. Bad assumption!
Looking at the cod
Gilles,
I applied your patch to v1.8 and it run successfully
on my SPARC machines.
Takahiro Kawashima,
MPI development team,
Fujitsu
> Kawashima-san and all,
>
> Here is attached a one off patch for v1.8.
> /* it does not use the __attribute__ modifier that might not be
> supported by all compi
Kawashima-san and all,
Here is attached a one off patch for v1.8.
/* it does not use the __attribute__ modifier that might not be
supported by all compilers */
as far as i am concerned, the same issue is also in the trunk,
and if you do not hit it, it just means you are lucky :-)
the same issue
Gilles, George,
The problem is the one Gilles pointed.
I temporarily modified the code bellow and the bus error disappeared.
--- orte/util/nidmap.c (revision 32447)
+++ orte/util/nidmap.c (working copy)
@@ -885,7 +885,7 @@
orte_proc_state_t state;
orte_app_idx_t app_idx;
int32_t
Hi George,
> Takahiro you can confirm this by printing the value of data when signal is
> raised.
It's in the trace.
0x07fede74
#2 0x0282aff4 (store + 0x540) (uid=(unsigned long *)
0x0118a128,scope=8:'\b',key=(char *) 0x0106a0a8
"opal.local.ldr",data=(void *) 0x
Kawashima-san,
This is interesting :-)
proc is in the stack and has type orte_process_name_t
with
typedef uint32_t orte_jobid_t;
typedef uint32_t orte_vpid_t;
struct orte_process_name_t {
orte_jobid_t jobid; /**< Job number */
orte_vpid_t vpid; /**< Process id - equivalent to
I have an extremely vague recollection about a similar issue in the
datatype engine: on the SPARC architecture the 64 bits integers must be
aligned on a 64bits boundary or you get a bus error.
Takahiro you can confirm this by printing the value of data when signal is
raised.
George.
On Fri, Au
Hi,
> > >>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
> > >>> 10 Sparc and I receive a bus error, if I run a small program.
I've finally reproduced the bus error in my SPARC environment.
#0 0x00db4740 (__waitpid_nocancel + 0x44)
(0x200,0x0,0x0,0xa0,0xf80100064af0,0
13 matches
Mail list logo