[OMPI devel] 1.8.7 and 1.10 problems

2015-07-24 Thread Ralph Castain
Hi folks I have bad news. The 1.8.7 tarball is incorrect - I grabbed the wrong one, and it is missing several commits. As if that isn’t enough, I’ve been informed that we also missed moving some critical fixes over to 1.8 in the MPI_Finalize area, and so users of PSM are getting segfaults. This

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Paul Hargrove
I admit to having lost track of the discussion split among the various PRs and this email thread. I have the following three system to test on: #1) ofi is the only mtl component which can build. #2) Both the ofi and portals4 mtl conponents build #3) Both the psm and mxm mtl components build I hav

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Paul Hargrove
Howard, Not sure if the "--mca mtl_base_verbose 10" output is still needed, but I've attached it in case it is. -Paul On Fri, Jul 24, 2015 at 7:26 AM, Howard Pritchard wrote: > Paul > > Could you rerun with --mca mtl_base_verbose 10 added to cmd line and send > output? > > Howard > > -

Re: [OMPI devel] 1.8.7 release tarball versus v1.8.7 tag in ompi-release repo

2015-07-24 Thread Ralph Castain
Hmmm...the most likely cause is that I generated the tag late - not immediately upon release. I tried to get the sha correct, but probably missed it. It’s possible that other changes came in afterwards, but I can take a look and see. The oob connection patch sounds strange, and I thought we had

[OMPI devel] 1.8.7 release tarball versus v1.8.7 tag in ompi-release repo

2015-07-24 Thread Lisandro Dalcin
Why the contents of the 1.8.7 release tarball versus the v1.8.7 tag in ompi-release repo differ? Any chance this was a mistake and the release tarball was generated with the wrong tree? Of course I do not care about VERSION, but there are two files related to RMA that are different. The release ta

[OMPI devel] mca_pml_cm_component_init

2015-07-24 Thread Howard Pritchard
Hi Folks, Should we do something better than what is done currently in the mca_pml_cm_component_init method around lines 158-162? That's what's causing a bunch of problems right now in 1.10. I'd like to see a better approach taken in the v2.x Howard

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Howard Pritchard
Hi Jeff, Nathan and I think this is generic to all the mtl's and masked by the stuff in the cm select method for upping the priority of the mtl. We'd see this behavior for all mtl's if this priority upping code wasn't there and we fell back to ob1. Howard 2015-07-24 9:12 GMT-06:00 Jeff Squyres

[OMPI devel] malloc(0) warning with 1.8.7

2015-07-24 Thread Lisandro Dalcin
Using a debug build of 1.8.7, I'm still getting this malloc(0) warning: malloc debug: Request for 0 bytes (coll_libnbc_ireduce_scatter_block.c, 67) The simple code below should reproduce it: $ cat ireduce_scatter_block.c #include int main(int argc, char *argv[]) { MPI_Request request; MPI_I

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Jeff Squyres (jsquyres)
I think Ralph answered this question: if you register a progress function but then get your component unloaded without un-registering the progress function... kaboom. > On Jul 24, 2015, at 10:37 AM, Howard Pritchard wrote: > > Jeff > > I was wrong about this. all the mtls except for portals

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Howard Pritchard
Jeff I was wrong about this. all the mtls except for portals4 register with opal progress in their comp init. I dont see how this is a problem though as base select only invokes comp init on the selected mtl. Howard -- sent from my smart phonr so no good type. Howard On Jul 24, 2015

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Ralph Castain
Glancing at the code, I believe I see the problem. The OFI MTL component registers an opal progress function during init, but the CM PML is not the one ultimately selected. Thus, the CM PML has its finalize called and is unloaded. During finalize, CM closes the MTL framework. This in turn calls

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Howard Pritchard
Paul Could you rerun with --mca mtl_base_verbose 10 added to cmd line and send output? Howard -- sent from my smart phonr so no good type. Howard On Jul 23, 2015 6:06 PM, "Paul Hargrove" wrote: > Yohann, > > With PR409 as it stands right now (commit 6daef310) I see no change to the >

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Jeff Squyres (jsquyres)
Yohann -- Can you have a look? > On Jul 24, 2015, at 10:15 AM, Howard Pritchard wrote: > > looks like ofi mtl is being naughty. its tje onlx mtl which registers with > opal progress in component init method. > > -- > > sent from my smart phonr so no good type. > > Howard > > On J

Re: [OMPI devel] 1.10.0rc2

2015-07-24 Thread Howard Pritchard
looks like ofi mtl is being naughty. its tje onlx mtl which registers with opal progress in component init method. -- sent from my smart phonr so no good type. Howard On Jul 23, 2015 7:03 PM, "Ralph Castain" wrote: > It looks like one of the MTL components is registering a progress ca