[OMPI devel] RFC: revamp btl rdma interface

2014-11-04 Thread Nathan Hjelm
What: Completely revamp the BTL RDMA interface (btl_put, btl_get) to better match what is needed for MPI one-sided. Why: I am preparing to push an enhanced MPI-3 one-sided component that makes use of network rdma and atomic operations to provide a fast truely one-sided implementation. Before I ca

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
Ah, gotcha. On Nov 4, 2014, at 5:41 PM, Steve Wise wrote: > Correct: I don't see the bug in the 1.8.4rc1 release. > > > On 11/4/2014 4:33 PM, Nathan Hjelm wrote: >> Looks like there is no issue in 1.8.4 except for the message coalescing >> bug. Ralph, Howard, and I agree that disabling messag

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Correct: I don't see the bug in the 1.8.4rc1 release. On 11/4/2014 4:33 PM, Nathan Hjelm wrote: Looks like there is no issue in 1.8.4 except for the message coalescing bug. Ralph, Howard, and I agree that disabling message coalescing for 1.8.4 is the safest way forward. We can back-port the re

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
Looks like there is no issue in 1.8.4 except for the message coalescing bug. Ralph, Howard, and I agree that disabling message coalescing for 1.8.4 is the safest way forward. We can back-port the real fix for an eventual 1.8.5. Message rates no longer seem to care about message coalescing in the o

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
There is one other bug fix to address the message coalescing bug. The rest is the BTL RDMA revamp. If there is a need I can probably pull those out and apply them to master sooner than SC. -Nathan On Tue, Nov 04, 2014 at 10:11:26PM +, Jeff Squyres (jsquyres) wrote: > It sounds like this fix

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
That sounds fine, but I think Steve's point is that he is being bitten by this bug now, so it would probably be good to even include this one particular fix in 1.8.4. On Nov 4, 2014, at 5:24 PM, Nathan Hjelm wrote: > Going to put the RFC out today with a timeout of about 2 weeks. This > will

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
Going to put the RFC out today with a timeout of about 2 weeks. This will give me some time to talk with other Open MPI developers face-to-face at SC14. If the RFC fails I will still bring that and a couple of other fixes into the master. -Nathan On Tue, Nov 04, 2014 at 04:06:45PM -0600, Steve W

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Jeff Squyres (jsquyres)
It sounds like this fix should be merged in soon. Nathan: are your other changes bug fixes, or part of your BTL revamp branch? On Nov 4, 2014, at 5:06 PM, Steve Wise wrote: > Ok, sounds like I should let you continue the good work! :) When do you plan > to merge this into ompi proper? > >

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Ok, sounds like I should let you continue the good work! :) When do you plan to merge this into ompi proper? On 11/4/2014 3:58 PM, Nathan Hjelm wrote: That certainly addresses part of the problem. I am working on a complete revamp of the btl RDMA interface. It contains this fix: https://gith

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
That certainly addresses part of the problem. I am working on a complete revamp of the btl RDMA interface. It contains this fix: https://github.com/hjelmn/ompi/commit/66fa429e306beb9fca59da0a4554e9b98d788316 -Nathan On Tue, Nov 04, 2014 at 03:27:23PM -0600, Steve Wise wrote: > I found the bug.

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
I'll issue a pull request for this and the other change I"m making. On 11/4/2014 3:27 PM, Steve Wise wrote: I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c index d876e21..8a

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
I found the bug. Here is the fix: [root@stevo1 openib]# git diff diff --git a/opal/mca/btl/openib/btl_openib_component.c b/opal/mca/btl/openib/btl_openib_component.c index d876e21..8a5ea82 100644 --- a/opal/mca/btl/openib/btl_openib_component.c +++ b/opal/mca/btl/openib/btl_openib_component.c

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Nathan Hjelm
I have run into the issue as well. I will open a pull request for 1.8.4 as part of a patch fixing the coalescing issues. -Nathan On Tue, Nov 04, 2014 at 02:50:30PM -0600, Steve Wise wrote: > On 11/4/2014 2:09 PM, Steve Wise wrote: > >Hi, > > > >I'm running ompi top-o-tree from github and seeing a

Re: [OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
On 11/4/2014 2:09 PM, Steve Wise wrote: Hi, I'm running ompi top-o-tree from github and seeing an openib btl issue where the qp/srq configuration is incorrect for the given device id. This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A simple 2 node IMB-MPI1 pingpong fails

[OMPI devel] openib choosing the wrong queue settings

2014-11-04 Thread Steve Wise
Hi, I'm running ompi top-o-tree from github and seeing an openib btl issue where the qp/srq configuration is incorrect for the given device id. This works fine in 1.8.4rc1, but I see the problem in top-of-tree. A simple 2 node IMB-MPI1 pingpong fails to get the ranks setup. I see this logg

[OMPI devel] Open MPI Developers Face to Face Q1 2015 (updated doodle poll link)

2014-11-04 Thread Howard Pritchard
Hi Folks, Per request to have a yes/yesifneedbe/no poll, and limitation of doodle to change options, a new doodle poll for deciding on the date for the next developers f2f is at: https://doodle.com/zzaupgxge9y6medu There is also a wiki page for the meeting: https://github.com/open-mpi/ompi/wiki

Re: [OMPI devel] thread-tests hang

2014-11-04 Thread Ralph Castain
That would be correct - we restored some configure flags that are required to make multi-thread programs work. Jeff can probably provide more info. > On Nov 4, 2014, at 9:15 AM, Alina Sklarevich > wrote: > > Hi, > > We observe a hang when running the multi-threading support test "latency.c"

[OMPI devel] OpenMPI Developers Face to Face Q1 2015 poll

2014-11-04 Thread Howard Pritchard
Hi OMPI folks, We're planning to hold another developers face to face in Q1 2015. Currently, we're thinking of holding the face to face either the last week of January, or one of the first two weeks of February. The format will be similar to the previous f2f in Chicago - start on Monday afternoon

[OMPI devel] thread-tests hang

2014-11-04 Thread Alina Sklarevich
Hi, We observe a hang when running the multi-threading support test "latency.c" (attached to this report), which uses MPI_THREAD_MULTIPLE. The hang happens immediately at the begining of the test and is reproduced in the v1.8 release branch. The command line to reproduce the behavior is: $ mpir

Re: [OMPI devel] OMPI 1.8.4rc1 issues

2014-11-04 Thread Ralph Castain
> On Nov 4, 2014, at 12:44 AM, Gilles Gouaillardet > wrote: > > Ralph, > > On 2014/11/04 1:54, Ralph Castain wrote: >> Hi folks >> >> Looking at the over-the-weekend MTT reports plus at least one comment on the >> list, we have the following issues to address: >> >> * many-to-one continues

[OMPI devel] Request for a Open MPI SotU BoF slot for VampirTrace

2014-11-04 Thread Bert Wesarg
All, the TU Dresden would like to talk a little bit about the current state of VampirTrace in Open MPI, its successor Score-P [1] and the future of the collaboration at the SC'14 BoF. I think a 5min talk to present the basic idea for Score-P project would be great to have, following an open d

Re: [OMPI devel] OMPI 1.8.4rc1 issues

2014-11-04 Thread Gilles Gouaillardet
Ralph, On 2014/11/04 1:54, Ralph Castain wrote: > Hi folks > > Looking at the over-the-weekend MTT reports plus at least one comment on the > list, we have the following issues to address: > > * many-to-one continues to fail. Shall I just assume this is an unfixable > problem or a bad test and i

Re: [OMPI devel] osu_mbw_mr error

2014-11-04 Thread Joshua Ladd
Thanks, Nathan. After a bit more investigation yesterday, this was our conclusion too; that it is a longstanding bug in OpenIB BTL we just happened to start triggering the broken flow with some recent changes made to the default max_lmc parameter. Let us know if you need anything from our end. Jos

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Ralph Castain
Ah, okay - thanks for clarifying that! > On Nov 3, 2014, at 9:12 PM, Gilles Gouaillardet > wrote: > > That works too since pthread is mandatory now > (i previously made a RFC and removing the --with-threads configure option is > in my todo list) > > On 2014/11/04 14:10, Ralph Castain wrote: >

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
That works too since pthread is mandatory now (i previously made a RFC and removing the --with-threads configure option is in my todo list) On 2014/11/04 14:10, Ralph Castain wrote: > Curious - why put it under condition of pthread config? I just added it to > the "if solaris" section - i.e., add

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Ralph Castain
Curious - why put it under condition of pthread config? I just added it to the “if solaris” section - i.e., add the flag if we are under solaris, regardless of someone asking for thread support. Since we require that libevent be thread-enabled, it seemed safer to always ensure those flags are se

Re: [OMPI devel] [1.8.4rc1] REGRESSION on Solaris-11/x86 with two subnets

2014-11-04 Thread Gilles Gouaillardet
Ralph, FYI, here is attached the patch i am working on (still testing ...) aa207ad2f3de5b649e5439d06dca90d86f5a82c2 should be reverted then. Cheers, Gilles On 2014/11/04 13:56, Paul Hargrove wrote: > Ralph, > > You will see from the message I sent a moment ago that -D_REENTRANT on > Solaris a