Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi/runtime

2014-02-06 Thread George Bosilca
This commit is unnecessary. The call to delete_proc is already there, few lines above your own patch. It was introduced on Jan 26 2014 with commit https://svn.open-mpi.org/trac/ompi/changeset/30430. George. On Feb 6, 2014, at 09:38 , svn-commit-mai...@open-mpi.org wrote: > Author: miked (M

[OMPI devel] C/R and orte_oob

2014-02-06 Thread Adrian Reber
When I initially made the C/R code compile again I made following change: diff --git a/orte/mca/rml/oob/rml_oob_component.c b/orte/mca/rml/oob/rml_oob_component.c index f0b22fc..90ed086 100644 --- a/orte/mca/rml/oob/rml_oob_component.c +++ b/orte/mca/rml/oob/rml_oob_component.c @@ -185,8 +185,7 @

Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi/runtime

2014-02-06 Thread Mike Dubman
Thanks we ported it from internal 1.7.x tree where I think it is not present. we will check it On Thu, Feb 6, 2014 at 2:40 PM, George Bosilca wrote: > This commit is unnecessary. The call to delete_proc is already there, few > lines above your own patch. It was introduced on Jan 26 2014 with co

Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi/runtime

2014-02-06 Thread Mike Dubman
It seems that similar code in not in v1.7 tree. On Thu, Feb 6, 2014 at 2:40 PM, George Bosilca wrote: > This commit is unnecessary. The call to delete_proc is already there, few > lines above your own patch. It was introduced on Jan 26 2014 with commit > https://svn.open-mpi.org/trac/ompi/chang

Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi/runtime

2014-02-06 Thread Ralph Castain
Okay, so let's revert this commit and instead CMR over the one that includes the required code. On Feb 6, 2014, at 9:16 AM, Mike Dubman wrote: > It seems that similar code in not in v1.7 tree. > > > On Thu, Feb 6, 2014 at 2:40 PM, George Bosilca wrote: > This commit is unnecessary. The call

Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi/runtime

2014-02-06 Thread Joshua Ladd
It's been CMRed, but scheduled for 1.7.5 https://svn.open-mpi.org/trac/ompi/ticket/4185 From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Mike Dubman Sent: Thursday, February 06, 2014 12:17 PM To: Open MPI Developers Subject: Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi

Re: [OMPI devel] [OMPI svn] svn:open-mpi r30571 - trunk/ompi/runtime

2014-02-06 Thread Ralph Castain
Kewl - I'll add it in the next wave. Meantime, we can revert this one Thanks! Ralph On Feb 6, 2014, at 9:18 AM, Joshua Ladd wrote: > It’s been CMRed, but scheduled for 1.7.5 > > https://svn.open-mpi.org/trac/ompi/ticket/4185 > > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of M

Re: [OMPI devel] mpirun oddity w/ PBS on an SGI UV

2014-02-06 Thread Paul Hargrove
Ralph, It worked on my second try, when I spelled it "ras_tm_smp" :-) Thanks, -Paul On Wed, Feb 5, 2014 at 11:59 AM, Paul Hargrove wrote: > Ralph, > > I will try to build tonight's trunk tarball and then test a run tomorrow. > Please ping me if I don't post my results by Thu evening (PST). >

Re: [OMPI devel] mpirun oddity w/ PBS on an SGI UV

2014-02-06 Thread Ralph Castain
crud - sorry about that! old man can't even remember his own param namesigh Thanks for checking it Ralph On Feb 6, 2014, at 9:47 AM, Paul Hargrove wrote: > Ralph, > > It worked on my second try, when I spelled it "ras_tm_smp" :-) > > Thanks, > -Paul > > > > On Wed, Feb 5, 2014 at 11:59

Re: [OMPI devel] C/R and orte_oob

2014-02-06 Thread Ralph Castain
The only reason I can think of for an OOB ft-event would be to tell the OOB to stop sending any messages. You would need to push that into the event library and use a callback event to let you know when it was done. Of course, once you did that, the OOB would no longer be available to, for exam

Re: [OMPI devel] C/R and orte_oob

2014-02-06 Thread Adrian Reber
Josh explained it to me a few days ago, that after a checkpoint has been received TCP should no longer be used to not lose any messages. The communication happens over named pipes and therefore (I think) OOB ft_event() is used to quite anything besides the pipes. This all seems to work but I was ju

Re: [OMPI devel] C/R and orte_oob

2014-02-06 Thread Ralph Castain
On Feb 6, 2014, at 2:16 PM, Adrian Reber wrote: > Josh explained it to me a few days ago, that after a checkpoint has been > received TCP should no longer be used to not lose any messages. The > communication happens over named pipes and therefore (I think) OOB > ft_event() is used to quite anyt

[OMPI devel] singleton appears to be broken

2014-02-06 Thread George Bosilca
A singleton hello_world assert with the following output: Warning :: opal_list_remove_item - the item 0x1211fc0 is not on the list 0x7f2cd9161ae0 hello: ../../../../ompi/orte/mca/rml/base/rml_base_msg_handlers.c:75: orte_rml_base_post_recv: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) ==

Re: [OMPI devel] singleton appears to be broken

2014-02-06 Thread Jeff Squyres (jsquyres)
I'm unable to replicate on Linux/RHEL/64 bit with a trunk build. How did you configure? Here's my configure: ./configure --prefix=/home/jsquyres/bogus --disable-vt --enable-mpirun-prefix-by-default --disable-mpi-fortran Does this happen with every run? On Feb 6, 2014, at 6:53 PM, George Bos

Re: [OMPI devel] singleton appears to be broken

2014-02-06 Thread Ralph Castain
Works for me on Mac and Linux/Centos6.2 as well On Feb 6, 2014, at 4:00 PM, Jeff Squyres (jsquyres) wrote: > I'm unable to replicate on Linux/RHEL/64 bit with a trunk build. How did you > configure? Here's my configure: > > ./configure --prefix=/home/jsquyres/bogus --disable-vt > --enable-

Re: [OMPI devel] singleton appears to be broken

2014-02-06 Thread Ralph Castain
Oh, should have noted: that's on both trunk and 1.7.4 On Feb 6, 2014, at 4:10 PM, Ralph Castain wrote: > Works for me on Mac and Linux/Centos6.2 as well > > > On Feb 6, 2014, at 4:00 PM, Jeff Squyres (jsquyres) > wrote: > >> I'm unable to replicate on Linux/RHEL/64 bit with a trunk build.

Re: [OMPI devel] singleton appears to be broken

2014-02-06 Thread George Bosilca
A rather long configure line: ./configure —enable-picky —enable-debug —enable-coverage —disable-heterogeneous —enable-visibility —enable-contrib-no-build=vt —enable-mpirun-prefix-by-default --disable-mpi-cxx --with-cma --enable-static --enable-mca-no-build=plm-tm,ess-tm,ras-tm,plm-tm,ras-slurm,

Re: [OMPI devel] singleton appears to be broken

2014-02-06 Thread George Bosilca
Out of 150 runs I could reproduce it once. When it failed I got exactly the same assert: hello: ../../../../ompi/orte/mca/rml/base/rml_base_msg_handlers.c:75: orte_rml_base_post_recv: Assertion `((0xdeafbeedULL << 32) + 0xdeafbeedULL) == ((opal_object_t *) (recv))->obj_magic_id’ failed. A quic

Re: [OMPI devel] singleton appears to be broken

2014-02-06 Thread Ralph Castain
Interesting - does it happen in finalize, or in the middle of execution? On Feb 6, 2014, at 5:57 PM, George Bosilca wrote: > Out of 150 runs I could reproduce it once. When it failed I got exactly the > same assert: > > hello: ../../../../ompi/orte/mca/rml/base/rml_base_msg_handlers.c:75: >

[OMPI devel] Bcol/mcol violations

2014-02-06 Thread Ralph Castain
As many of you will have noticed, I have been struggling most of the evening with breakage on the trunk. This was initiated by adding .ompi_ignore to the coll/ml component, but the root cause of the problem is a blatant disregard for OMPI design rules in the bcol framework. Component-level heade