Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Gilles Gouaillardet
ggouaillardet -> ggouaillardet On 2014/09/10 19:46, Jeff Squyres (jsquyres) wrote: > As the next step of the planned migration to Github, I need to know: > > - Your Github ID (so that you can be added to the new OMPI git repo) > - Your SVN ID (so that I can map SVN->Github IDs, and therefore map T

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph, things got worst indeed :-( now a simple hello world involving two hosts hang in mpi_init. there is still a race condition : if a tasks a call fence long after task b, then task b will never leave the fence i ll try to debug this ... Cheers, Gilles On 2014/09/11 2:36, Ralph Castain wro

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph, the root cause is when the second orted/mpirun runs rcd_finalize_coll, it does not invoke pmix_server_release because allgather_stub was not previously invoked since the the fence was not yet entered. /* in rcd_finalize_coll, coll->cbfunc is NULL */ the attached patch is likely not the rig

[OMPI devel] Github migration plan

2014-09-11 Thread Jeff Squyres (jsquyres)
Rolf -- I'll be ready to discuss a concrete plan and timeline to migrate to Github next Tuesday. Can you please add me to Tuesday's agenda? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI devel] Github migration plan

2014-09-11 Thread Jeff Squyres (jsquyres)
[cid:9A7FB12E-C5CB-424A-9168-586C48C4AE90@cisco.com] On Sep 11, 2014, at 8:15 AM, Jeff Squyres (jsquyres) mailto:jsquy...@cisco.com>> wrote: Rolf -- I'll be ready to discuss a concrete plan and timeline to migrate to Github next Tuesday. Can you please add me to Tuesday's agenda? -- Jeff Sq

Re: [OMPI devel] clang alignment warnings

2014-09-11 Thread Jeff Squyres (jsquyres)
I re-ran the test, just to ensure I had the line numbers right (I have some local edits in my SVN copy): - mca_base_var.c:681:18: runtime error: member access within misaligned address 0x2b338409 for type 'mca_base_var_storage_t', which requires 8 byte alignment - This is referring

Re: [OMPI devel] clang alignment warnings

2014-09-11 Thread Ralph Castain
I'm not convinced it is a bug in clang, Jeff - we know that Siegmar has been getting segfaults in the mca var code, though it isn't clear if those are alignment issues or not (looked like them, but can't say with certainty). May just need to ask him to run the current trunk and see if the proble

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Ralph Castain
Yeah, that's not the right fix, I'm afraid. I've made the direct component the default again until I have time to dig into this deeper. On Sep 11, 2014, at 4:02 AM, Gilles Gouaillardet wrote: > Ralph, > > the root cause is when the second orted/mpirun runs rcd_finalize_coll, > it does not inv

Re: [OMPI devel] Need to know your Github ID

2014-09-11 Thread Thomas Naughton
naughtont -> naughtont3 Thanks, --tjn _ Thomas Naughton naught...@ornl.gov Research Associate (865) 576-4184 On Wed, 10 Sep 2014, Jeff Squyres (js

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32711 - trunk/opal/mca/pmix/cray

2014-09-11 Thread Tim Mattox
I'm sure that is not what you meant to do... the assignment to NULL should occur AFTER the free()... On Thu, Sep 11, 2014 at 4:30 PM, wrote: > Author: hppritcha (Howard Pritchard) > Date: 2014-09-11 16:30:40 EDT (Thu, 11 Sep 2014) > New Revision: 32711 > URL: https://svn.open-mpi.org/trac/ompi/c

Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32711 - trunk/opal/mca/pmix/cray

2014-09-11 Thread Pritchard Jr., Howard
thanks, it was bad cut/paste From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Tim Mattox Sent: Thursday, September 11, 2014 2:54 PM To: Open MPI Developers Subject: Re: [OMPI devel] [OMPI svn-full] svn:open-mpi r32711 - trunk/opal/mca/pmix/cray I'm sure that is not what you meant to

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Gilles Gouaillardet
Ralph, you are right, this was definetly not the right fix (at least with 4 nodes or more) i finally understood what is going wrong here : to make it simple, the allgather recursive doubling algo is not implemented with MPI_Recv(...,peer,...) like functions but with MPI_Recv(...,MPI_ANY_SOURCE,..

Re: [OMPI devel] race condition in grpcomm/rcd

2014-09-11 Thread Ralph Castain
The design is supposed to be that each node knows precisely how many daemons are involved in each collective, and who is going to talk to them. The signature contains the info required to ensure the receiver knows which collective this message relates to, and just happens to also allow them to