On Thu, Aug 7, 2014 at 10:55 AM, Ralph Castain wrote:
> * fixes to coll/ml that expanded to fixing page alignment in general -
> someone needs to review/approve it:
> https://svn.open-mpi.org/trac/ompi/ticket/4826
>
I've been able to confirm that the nightly tarball (1.8.2rc4r32480) work
On Thu, Aug 7, 2014 at 10:55 AM, Ralph Castain wrote:
> * static linking failure - Gilles has posted a proposed fix, but somebody
> needs to approve and CMR it. Please see:
> https://svn.open-mpi.org/trac/ompi/ticket/4834
>
Jeff moved the fix to v1.8 in r32471.
I have tested tonight's t
I will try to take a look this week and see what I can do.
-Nathan
From: devel [devel-boun...@open-mpi.org] on behalf of George Bosilca
[bosi...@icl.utk.edu]
Sent: Thursday, August 07, 2014 10:37 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] RFC: ad
These are harmless. They are only used when FT is enabled which should
rarely be the case.
George.
On Fri, Aug 8, 2014 at 4:36 PM, Jeff Squyres (jsquyres)
wrote:
> Here's a few ORTE headers in OPAL source -- can respective owners clean
> these up? Thanks.
>
> -
> mca/btl/smcuda/btl_smc
Here's a few ORTE headers in OPAL source -- can respective owners clean these
up? Thanks.
-
mca/btl/smcuda/btl_smcuda.c
63:#include "orte/mca/sstore/sstore.h"
mca/btl/sm/btl_sm.c
62:#include "orte/mca/sstore/sstore.h"
mca/mpool/sm/mpool_sm_module.c
34:#include "orte/mca/sstore/sstore.h"
--
I found a few more OMPI header files included in OPAL source code. Can the
respective owners clean this stuff up?
Thanks!
-
mca/btl/openib/btl_openib_component.c
87:#include "ompi/mca/rte/rte.h"
mca/btl/ugni/btl_ugni_component.c
20:#include "ompi/runtime/params.h"
mca/btl/ugni/btl_ugni_ad
Yes, I know - but the problem comes from nidmap pushing data down into the
opal_db/dstore level, which then creates a copy of the data. That's where the
alignment error is generated
On Aug 8, 2014, at 11:17 AM, George Bosilca wrote:
> On Fri, Aug 8, 2014 at 5:21 AM, Ralph Castain wrote:
> So
I will attempt to confirm on my Solaris-10 system ASAP.
That will allow me to finally be certain that the other static linking
issue has been resolved.
-Paul
On Fri, Aug 8, 2014 at 11:39 AM, Jeff Squyres (jsquyres) wrote:
> Thanks!
>
> On Aug 8, 2014, at 2:30 PM, George Bosilca wrote:
>
> > r
Thanks!
On Aug 8, 2014, at 2:30 PM, George Bosilca wrote:
> r32467 should fix the problem.
>
> George.
>
>
> On Fri, Aug 8, 2014 at 1:20 PM, Jeff Squyres (jsquyres)
> wrote:
> That'll do it...
>
> George: can you fix?
>
>
> On Aug 8, 2014, at 1:11 PM, Ralph Castain wrote:
>
> > I thi
r32467 should fix the problem.
George.
On Fri, Aug 8, 2014 at 1:20 PM, Jeff Squyres (jsquyres)
wrote:
> That'll do it...
>
> George: can you fix?
>
>
> On Aug 8, 2014, at 1:11 PM, Ralph Castain wrote:
>
> > I think it might be getting pulled in from this include:
> >
> > opal/mca/common/sm/
On Fri, Aug 8, 2014 at 5:21 AM, Ralph Castain wrote:
> Sorry to chime in a little late. George is likely correct about using
> ORTE_NAME, only you can't do that as the OPAL layer has no idea what that
> datatype looks like. This was the original reason for creating the
> opal_identifier_t type -
That'll do it...
George: can you fix?
On Aug 8, 2014, at 1:11 PM, Ralph Castain wrote:
> I think it might be getting pulled in from this include:
>
> opal/mca/common/sm/common_sm.h:37:#include "ompi/group/group.h"
>
>
> On Aug 8, 2014, at 5:33 AM, Jeff Squyres (jsquyres)
> wrote:
>
>> We
I think it might be getting pulled in from this include:
opal/mca/common/sm/common_sm.h:37:#include "ompi/group/group.h"
On Aug 8, 2014, at 5:33 AM, Jeff Squyres (jsquyres) wrote:
> Weirdness; I don't see any name like that in the SM BTL.
>
> I see it used in the OMPI layer... not sure how it
Done; thanks.
On Aug 8, 2014, at 11:05 AM, Tim Mattox wrote:
> Jeff,
> I may someday again be working for an organization that is an Open MPI
> contributor... so could you
> update my e-mail address in the authors.txt file to be "timattox = Tim Mattox
> "
> Thanks!
>
>
> On Fri, Aug 8, 2014
Committed a fix for this in r32460 - see if I got it!
On Aug 8, 2014, at 4:02 AM, Gilles Gouaillardet
wrote:
> Folks,
>
> here is the description of a hang i briefly mentionned a few days ago.
>
> with the trunk (i did not check 1.8 ...) simply run on one node :
> mpirun -np 2 --mca btl sm,se
Committed a fix for this in r32459 - please check and see if this resolves the
issue.
On Aug 8, 2014, at 2:21 AM, Ralph Castain wrote:
> Sorry to chime in a little late. George is likely correct about using
> ORTE_NAME, only you can't do that as the OPAL layer has no idea what that
> datatyp
Fixed in r32462
On Aug 8, 2014, at 8:13 AM, Mike Dubman wrote:
>
> Josh,Devendar - could you please take a look?
> Thanks
>
> 15:45:00 Making install in mca/coll/fca
> 15:45:00 make[2]: Entering directory
> `/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi/mca/coll/fca'
*Josh,Devendar - could you please take a look?*
*Thanks*
*15:45:00* Making install in mca/coll/fca*15:45:00* make[2]: Entering
directory
`/scrap/jenkins/scrap/workspace/hpc-ompi-shmem/label/hpc-test-node/ompi/mca/coll/fca'*15:45:00*
CC coll_fca_module.lo*15:45:00* coll_fca_module.c: In
Jeff,
I may someday again be working for an organization that is an Open MPI
contributor... so could you
update my e-mail address in the authors.txt file to be "timattox = Tim
Mattox "
Thanks!
On Fri, Aug 8, 2014 at 11:00 AM, Jeff Squyres (jsquyres) wrote:
> SHORT VERSION
> =
>
> Pl
SHORT VERSION
=
Please verify/update the email address that you'd like me to use for your Open
MPI commits when we do the git conversion:
https://github.com/open-mpi/authors
Updates are due by COB Friday, 15 Aug, 2014 (1 week from today).
MORE DETAIL
===
Dave and I are
SHORT VERSION
=
The ./contrib/check-help-strings.pl script is showing ***47 coding errors***
with regards to using show_help() in components. Here's a summary of the
offenders:
- ORTE (lumped together because there's a single maintainer :-) )
- smcuda and cuda
- common/verbs
- bcol
Weirdness; I don't see any name like that in the SM BTL.
I see it used in the OMPI layer... not sure how it's being using down in the
btl SM component file...?
On Aug 7, 2014, at 11:25 PM, Paul Hargrove wrote:
> Testing r32448 on trunk for trac issue #4834, I encounter the following which
>
Folks,
here is the description of a hang i briefly mentionned a few days ago.
with the trunk (i did not check 1.8 ...) simply run on one node :
mpirun -np 2 --mca btl sm,self ./abort
(the abort test is taken from the ibm test suite : process 0 call
MPI_Abort while process 1 enters an infinite lo
Sorry to chime in a little late. George is likely correct about using
ORTE_NAME, only you can't do that as the OPAL layer has no idea what that
datatype looks like. This was the original reason for creating the
opal_identifier_t type - I had no other choice when we moved the db framework
(now d
George,
(one of the) faulty line was :
if (ORTE_SUCCESS != (rc =
opal_db.store((opal_identifier_t*)ORTE_PROC_MY_NAME, OPAL_SCOPE_INTERNAL,
OPAL_DB_LOCALLDR, (opal_identifier_t*)&proc, OPAL_ID_T))) {
so if proc is not 64 bits aligned, a SIGBUS will occur on sparc.
as you point
This is a gigantic patch for an almost trivial issue. The current problem
is purely related to the fact that in a single location (nidmap.c) the
orte_process_name_t (which is a structure of 2 integers) is supposed to be
aligned based on the uint64_t requirements. Bad assumption!
Looking at the cod
Gilles,
I applied your patch to v1.8 and it run successfully
on my SPARC machines.
Takahiro Kawashima,
MPI development team,
Fujitsu
> Kawashima-san and all,
>
> Here is attached a one off patch for v1.8.
> /* it does not use the __attribute__ modifier that might not be
> supported by all compi
Kawashima-san and all,
Here is attached a one off patch for v1.8.
/* it does not use the __attribute__ modifier that might not be
supported by all compilers */
as far as i am concerned, the same issue is also in the trunk,
and if you do not hit it, it just means you are lucky :-)
the same issue
Gilles, George,
The problem is the one Gilles pointed.
I temporarily modified the code bellow and the bus error disappeared.
--- orte/util/nidmap.c (revision 32447)
+++ orte/util/nidmap.c (working copy)
@@ -885,7 +885,7 @@
orte_proc_state_t state;
orte_app_idx_t app_idx;
int32_t
Paul's tests identified an small issue with the previous patch (a real
corner-case for ARM v5). The patch below is fixing all known issues.
Btw, there is still room for volunteers for the .asm work.
George.
On Tue, Aug 5, 2014 at 2:23 PM, George Bosilca wrote:
> Thanks to Paul help all the
Hi George,
> Takahiro you can confirm this by printing the value of data when signal is
> raised.
It's in the trace.
0x07fede74
#2 0x0282aff4 (store + 0x540) (uid=(unsigned long *)
0x0118a128,scope=8:'\b',key=(char *) 0x0106a0a8
"opal.local.ldr",data=(void *) 0x
Kawashima-san,
This is interesting :-)
proc is in the stack and has type orte_process_name_t
with
typedef uint32_t orte_jobid_t;
typedef uint32_t orte_vpid_t;
struct orte_process_name_t {
orte_jobid_t jobid; /**< Job number */
orte_vpid_t vpid; /**< Process id - equivalent to
I have an extremely vague recollection about a similar issue in the
datatype engine: on the SPARC architecture the 64 bits integers must be
aligned on a 64bits boundary or you get a bus error.
Takahiro you can confirm this by printing the value of data when signal is
raised.
George.
On Fri, Au
Hi,
> > >>> I have installed openmpi-1.8.2rc2 with gcc-4.9.0 on Solaris
> > >>> 10 Sparc and I receive a bus error, if I run a small program.
I've finally reproduced the bus error in my SPARC environment.
#0 0x00db4740 (__waitpid_nocancel + 0x44)
(0x200,0x0,0x0,0xa0,0xf80100064af0,0
34 matches
Mail list logo