[OMPI devel] Please test: v1.10.1rc3

2015-10-28 Thread Ralph Castain
Last call - please give this version a whirl. http://www.open-mpi.org/software/ompi/v1.10/

Re: [OMPI devel] PMIX deadlock

2015-10-28 Thread Ralph Castain
Should have also clarified: the prior fixes are indeed in the current master. > On Oct 28, 2015, at 12:42 AM, Ralph Castain wrote: > > Nope - I was wrong. The correction on the client side consisted of attempting > to timeout if the blocking recv failed. We then modified the blocking > send/re

Re: [OMPI devel] PMIX deadlock

2015-10-28 Thread Ralph Castain
Nope - I was wrong. The correction on the client side consisted of attempting to timeout if the blocking recv failed. We then modified the blocking send/recv so they would handle errors. So that problem occurred -after- the server had correctly called accept. The listener code is in opal/mca/pm

Re: [OMPI devel] PMIX deadlock

2015-10-28 Thread Ralph Castain
Looking at the code, it appears that a fix was committed for this problem, and that we correctly resolved the issue found by Paul. The problem is that the fix didn’t get upstreamed, and so it was lost the next time we refreshed PMIx. Sigh. Let me try to recreate the fix and have you take a gande

Re: [OMPI devel] PMIX deadlock

2015-10-28 Thread Ralph Castain
Here is the discussion - afraid it is fairly lengthy. Ignore the hwloc references in it as that was a separate issue: http://www.open-mpi.org/community/lists/devel/2015/09/18074.php It definitely sounds like the same issue creepi

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-28 Thread George Bosilca
This is puzzling. I cannot reproduce either, not even with a fresh clone. Let's assume by now this was a false alert. George. On Wed, Oct 28, 2015 at 2:01 AM, Gilles Gouaillardet wrote: > George, > > as i wrote, i cannot reproduce the issue so i just had to guess. > my best guess is the wron

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-28 Thread Gilles Gouaillardet
George, as i wrote, i cannot reproduce the issue so i just had to guess. my best guess is the wrong pmix_server.h is #include'd so pmix_common.h is not even #include'd at all i checked the include path cd opal/mca/pmix/pmix1xx/pmix && make clean && make -n src/server/pmix_server_get.lo if yo

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-28 Thread George Bosilca
Gilles, Supposing that pmix_common.h has been already included, by adding it again I should have obtained nothing new. I don't know which one is picked up, but now there is at least one pmix_common.h to be included. If you look carrefully you will notice that the pmix_server.h includes pmix/pmix_

Re: [OMPI devel] PMIX deadlock

2015-10-28 Thread George Bosilca
Interesting. Do you have a pointer to the commit (or/and to the discussion)? I looked at the PMIX code, and I have identified few issues, but unfortunately none of them seem to fix the problem for good. However, now I need more than 1000 runs to get a deadlock (instead of few tens). Looking with

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-28 Thread Gilles Gouaillardet
George, pmix_common.h is #include'd by pmix_server.h well ... pmix_common.h is #include'd by opal/mca/pmix/pmix1xx/pmix/include/pmix_server.h and there are total 3 pmix_server.h find . -name pmix_server.h ./opal/mca/pmix/pmix1xx/pmix/include/pmix_server.h ./opal/mca/pmix/pmix_server.h ./orte/o

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-28 Thread George Bosilca
Interesting, I wonder how your compiler gets to know the definition of the PMIX_ERR_SILENT without the pmix_common.h. I just pushed a fix. George. On Wed, Oct 28, 2015 at 12:43 AM, Gilles Gouaillardet wrote: > George, > > i am unable to reproduce the issue. > if build still breaks for you, c

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-28 Thread Gilles Gouaillardet
George, i am unable to reproduce the issue. if build still breaks for you, could you send me your configure command line ? Cheers, Gilles On 10/28/2015 1:04 PM, Gilles Gouaillardet wrote: George, PMIX_ERR_SILENT is defined in opal/mca/pmix/pmix1xx/pmix/include/pmix/pmix_common.h.in i ll

Re: [OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-28 Thread Gilles Gouaillardet
George, PMIX_ERR_SILENT is defined in opal/mca/pmix/pmix1xx/pmix/include/pmix/pmix_common.h.in i ll have a look at it from now Cheers, Gilles On 10/28/2015 12:02 PM, George Bosilca wrote: We get a nice compiler complaint: ../../../../../../ompi/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_s