ggouaillardet -> ggouaillardet
On 2014/09/10 19:46, Jeff Squyres (jsquyres) wrote:
> As the next step of the planned migration to Github, I need to know:
>
> - Your Github ID (so that you can be added to the new OMPI git repo)
> - Your SVN ID (so that I can map SVN->Github IDs, and therefore map T
Ralph,
things got worst indeed :-(
now a simple hello world involving two hosts hang in mpi_init.
there is still a race condition : if a tasks a call fence long after task b,
then task b will never leave the fence
i ll try to debug this ...
Cheers,
Gilles
On 2014/09/11 2:36, Ralph Castain wro
Ralph,
the root cause is when the second orted/mpirun runs rcd_finalize_coll,
it does not invoke pmix_server_release
because allgather_stub was not previously invoked since the the fence
was not yet entered.
/* in rcd_finalize_coll, coll->cbfunc is NULL */
the attached patch is likely not the rig
Rolf --
I'll be ready to discuss a concrete plan and timeline to migrate to Github next
Tuesday.
Can you please add me to Tuesday's agenda?
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
[cid:9A7FB12E-C5CB-424A-9168-586C48C4AE90@cisco.com]
On Sep 11, 2014, at 8:15 AM, Jeff Squyres (jsquyres)
mailto:jsquy...@cisco.com>> wrote:
Rolf --
I'll be ready to discuss a concrete plan and timeline to migrate to Github next
Tuesday.
Can you please add me to Tuesday's agenda?
--
Jeff Sq
I re-ran the test, just to ensure I had the line numbers right (I have some
local edits in my SVN copy):
-
mca_base_var.c:681:18: runtime error: member access within misaligned address
0x2b338409 for type 'mca_base_var_storage_t', which requires 8 byte
alignment
-
This is referring
I'm not convinced it is a bug in clang, Jeff - we know that Siegmar has been
getting segfaults in the mca var code, though it isn't clear if those are
alignment issues or not (looked like them, but can't say with certainty). May
just need to ask him to run the current trunk and see if the proble
Yeah, that's not the right fix, I'm afraid. I've made the direct component the
default again until I have time to dig into this deeper.
On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet
wrote:
> Ralph,
>
> the root cause is when the second orted/mpirun runs rcd_finalize_coll,
> it does not inv
naughtont -> naughtont3
Thanks,
--tjn
_
Thomas Naughton naught...@ornl.gov
Research Associate (865) 576-4184
On Wed, 10 Sep 2014, Jeff Squyres (js
I'm sure that is not what you meant to do...
the assignment to NULL should occur AFTER the free()...
On Thu, Sep 11, 2014 at 4:30 PM, wrote:
> Author: hppritcha (Howard Pritchard)
> Date: 2014-09-11 16:30:40 EDT (Thu, 11 Sep 2014)
> New Revision: 32711
> URL: https://svn.open-mpi.org/trac/ompi/c
thanks, it was bad cut/paste
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Tim Mattox
Sent: Thursday, September 11, 2014 2:54 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32711 -
trunk/opal/mca/pmix/cray
I'm sure that is not what you meant to
Ralph,
you are right, this was definetly not the right fix (at least with 4
nodes or more)
i finally understood what is going wrong here :
to make it simple, the allgather recursive doubling algo is not
implemented with
MPI_Recv(...,peer,...) like functions but with
MPI_Recv(...,MPI_ANY_SOURCE,..
The design is supposed to be that each node knows precisely how many daemons
are involved in each collective, and who is going to talk to them. The
signature contains the info required to ensure the receiver knows which
collective this message relates to, and just happens to also allow them to
13 matches
Mail list logo