[OMPI devel] Fwd: [OMPI commits] Git: open-mpi/ompi branch master updated. dev-2921-gb603307

2015-10-27 Thread George Bosilca
We get a nice compiler complaint: ../../../../../../ompi/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_get.c: In function 'pmix_server_get': ../../../../../../ompi/opal/mca/pmix/pmix1xx/pmix/src/server/pmix_server_get.c:131: error: 'PMIX_ERR_SILENT' undeclared (first use in this function) ../.

Re: [OMPI devel] Open MPI autogen.pl in tarball

2015-10-27 Thread Jeff Squyres (jsquyres)
On Oct 27, 2015, at 4:46 PM, Gilles Gouaillardet wrote: > > my 0.02 US$ ... > > - autogen.pl was recently used with v1.10 on a PowerPC Little Endian arch > (that was mandatory since the libtool we use to generate v1.10 series do not > yet support PPC LE) True. But we fixed that; it'll be in 1

Re: [OMPI devel] master build fails

2015-10-27 Thread Gilles Gouaillardet
FWIW before Jeff fixed that, build was successful on my RHEL7 box (stdio.h is included from verbs_exp.h that is included from verbs.h) but failed on my RHEL6 box (verbs.h does *not* include stdio.h) so there was some room for Jenkins not to fail Cheers, Gilles On 10/27/2015 9:17 PM, Jeff Squy

Re: [OMPI devel] PMIX deadlock

2015-10-27 Thread Ralph Castain
Hmmm…this looks like it might be that problem we previously saw where the blocking recv hangs in a proc when the blocking send tries to send before the domain socket is actually ready, and so the send fails on the other end. As I recall, it was something to do with the socketoptions - and then P

Re: [OMPI devel] Open MPI autogen.pl in tarball

2015-10-27 Thread Gilles Gouaillardet
Jeff and all, my 0.02 US$ ... - autogen.pl was recently used with v1.10 on a PowerPC Little Endian arch (that was mandatory since the libtool we use to generate v1.10 series do not yet support PPC LE) - if we remove (from the tarball) autogen.pl, should we also remove configure.ac ? and w

Re: [OMPI devel] PMIX deadlock

2015-10-27 Thread George Bosilca
It appear the branch solve the problem at least partially. I asked one of my students to hammer it pretty badly, and he reported that the deadlocks still occur. He also graciously provided some stacktraces: #0 0x7f4bd5274aed in nanosleep () from /lib64/libc.so.6 #1 0x7f4bd52a9c94 in usle

Re: [OMPI devel] Segv in MTT

2015-10-27 Thread Ralph Castain
I found the problem - fix coming shortly. > On Oct 27, 2015, at 12:49 PM, Ralph Castain wrote: > > I’m seeing similar failures in the master from several collectives. Looking > at the stack, here is what I see on all of them: > > (gdb) where > #0 0x7fe49931a5d7 in raise () from /usr/lib6

Re: [OMPI devel] Open MPI autogen.pl in tarball

2015-10-27 Thread Paul Hargrove
Maybe not relevant, but... In the GASNet and Berkeley UPC projects we include our analogue of autogen.sh in tarballs, too. Because of this our analogue of MTT is able to exercise it across many versions of the autotools. This *has* actually allowed us to learn of problems in our configury before d

[OMPI devel] Open MPI autogen.pl in tarball

2015-10-27 Thread Jeff Squyres (jsquyres)
Yo Brian Barrett: cast your brain into the WayBack(tm) machine... Do you remember why we include autogen.pl in distribution tarballs? My recollection is: 1. It was handy for OMPI developers to "make dist" in a SVN checkout to take a tarball over to back-end machine where you couldn't do an SVN

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-27 Thread Ralph Castain
Good to hear - thanks! > On Oct 27, 2015, at 11:37 AM, Mark Santcroos > wrote: > > >> On 24 Oct 2015, at 7:54 , Mark Santcroos wrote: >> Will test it on real systems once it hits master. > > FYI: Its been holding up pretty well on real deployment too! > __

Re: [OMPI devel] Segv in MTT

2015-10-27 Thread Ralph Castain
I’m seeing similar failures in the master from several collectives. Looking at the stack, here is what I see on all of them: (gdb) where #0 0x7fe49931a5d7 in raise () from /usr/lib64/libc.so.6 #1 0x7fe49931be08 in abort () from /usr/lib64/libc.so.6 #2 0x7fe49935ae07 in __libc_messa

[OMPI devel] Segv in MTT

2015-10-27 Thread Ralph Castain
Anyone have an idea of what this is all about? >> Command: mpirun --hostfile /home/common/hosts -np 16 --prefix >> /home/common/openmpi/build/foobar/ collective/alltoall_in_place Elapsed: 00:00:00 0.00u 0.00s Test: alltoall_in_place, np=16, variant=1: Passed *** Error in `collect

Re: [OMPI devel] orte-dvm / orte-submit race condition

2015-10-27 Thread Mark Santcroos
> On 24 Oct 2015, at 7:54 , Mark Santcroos wrote: > Will test it on real systems once it hits master. FYI: Its been holding up pretty well on real deployment too!

Re: [OMPI devel] PMIX deadlock

2015-10-27 Thread Ralph Castain
I haven’t been able to replicate this when using the branch in this PR: https://github.com/open-mpi/ompi/pull/1073 Would you mind giving it a try? It fixes some other race conditions and might pick this one up too. > On Oct 27, 2015, at 10:04 AM, R

Re: [OMPI devel] PMIX deadlock

2015-10-27 Thread Ralph Castain
Okay, I’ll take a look - I’ve been chasing a race condition that might be related > On Oct 27, 2015, at 9:54 AM, George Bosilca wrote: > > No, it's using 2 nodes. > George. > > > On Tue, Oct 27, 2015 at 12:35 PM, Ralph Castain > wrote: > Is this on a single node?

Re: [OMPI devel] PMIX deadlock

2015-10-27 Thread George Bosilca
No, it's using 2 nodes. George. On Tue, Oct 27, 2015 at 12:35 PM, Ralph Castain wrote: > Is this on a single node? > > On Oct 27, 2015, at 9:25 AM, George Bosilca wrote: > > I get intermittent deadlocks wit the latest trunk. The smallest reproducer > is a shell for loop around a small (2 pro

Re: [OMPI devel] PMIX deadlock

2015-10-27 Thread Ralph Castain
Is this on a single node? > On Oct 27, 2015, at 9:25 AM, George Bosilca wrote: > > I get intermittent deadlocks wit the latest trunk. The smallest reproducer is > a shell for loop around a small (2 processes) short (20 seconds) MPI > application. After few tens of iterations the MPI_Init will

[OMPI devel] PMIX deadlock

2015-10-27 Thread George Bosilca
I get intermittent deadlocks wit the latest trunk. The smallest reproducer is a shell for loop around a small (2 processes) short (20 seconds) MPI application. After few tens of iterations the MPI_Init will deadlock with the following backtrace: #0 0x7fa94b5d9aed in nanosleep () from /lib64/l

Re: [OMPI devel] Compile only one framework/component

2015-10-27 Thread Jeff Squyres (jsquyres)
On Oct 27, 2015, at 5:48 AM, Federico Reghenzani wrote: > > Oh good, thank you all and sorry for the banal question. No worries -- don't fret about asking banal questions here. We try to document as best we can, but there's a lot of "tribal knowledge" that is in the heads of the Open MPI dev

Re: [OMPI devel] Compile only one framework/component

2015-10-27 Thread Federico Reghenzani
Oh good, thank you all and sorry for the banal question. __ Federico Reghenzani M.Eng. Student @ Politecnico di Milano Computer Science and Engineering 2015-10-27 13:38 GMT+01:00 Jeff Squyres (jsquyres) : > In addition to what others said, check out this wiki page -- it offers > some insight i

Re: [OMPI devel] Checkpoint/restart + migration

2015-10-27 Thread Gianmario Pozzi
Thank you guys, your help is really appriciated! We'll keep in touch for further information. Gianmario Il 23/ott/2015 12:44 "Jeff Squyres (jsquyres)" ha scritto: > On Oct 22, 2015, at 7:17 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com> wrote: > > > > Gianmario, > > > > there was c/

Re: [OMPI devel] Compile only one framework/component

2015-10-27 Thread Jeff Squyres (jsquyres)
In addition to what others said, check out this wiki page -- it offers some insight in this kind of stuff: https://github.com/open-mpi/ompi/wiki/devel-CreateComponent > On Oct 27, 2015, at 5:11 AM, Federico Reghenzani > wrote: > > Is there any option for `make` to start compilation only

Re: [OMPI devel] Compile only one framework/component

2015-10-27 Thread Gilles Gouaillardet
Federico, in order to build one component, just cd into the component directory (in build directory if you are using VPATH) and run make (install) components and frameworks depend on other framework, so it is generally safer to run make from the top build directory Cheers, Gilles On Tuesday, O

Re: [OMPI devel] Compile only one framework/component

2015-10-27 Thread Ralph Castain
Note that I’m assuming that all changes are in the component whose directory you are doing the “make” in… > On Oct 27, 2015, at 5:16 AM, Ralph Castain wrote: > > Not at the framework level, but you can cd into any component directory and > do a “make install” and then just execute again. > >

Re: [OMPI devel] master build fails

2015-10-27 Thread Jeff Squyres (jsquyres)
Blah. It passed Jenkins. :-\ Looks like a simple missing . I'll fix. > On Oct 27, 2015, at 5:07 AM, Howard Pritchard wrote: > > Hi Folks, > > Looks like master can't build any more, at least not on cray with > --enable-picky option: > > -- make all -j 8 result_stderr --- > keyval_lex.c:

Re: [OMPI devel] master build fails

2015-10-27 Thread Ralph Castain
Looks like you’re just missing a required header that Cray has in a different place - master builds fine on my box > On Oct 27, 2015, at 5:07 AM, Howard Pritchard wrote: > > Hi Folks, > > Looks like master can't build any more, at least not on cray with > --enable-picky option: > > -- make a

Re: [OMPI devel] Compile only one framework/component

2015-10-27 Thread Ralph Castain
Not at the framework level, but you can cd into any component directory and do a “make install” and then just execute again. > On Oct 27, 2015, at 5:11 AM, Federico Reghenzani > wrote: > > Is there any option for `make` to start compilation only for one framework or > component? Even if ther

[OMPI devel] Compile only one framework/component

2015-10-27 Thread Federico Reghenzani
Is there any option for `make` to start compilation only for one framework or component? Even if there are no modifications a simple make takes on my machine ~24sec to check all ompi directories (I know this is not a big time, but it's a bit tedious during development) Cheers, Federico __ Federi

[OMPI devel] master build fails

2015-10-27 Thread Howard Pritchard
Hi Folks, Looks like master can't build any more, at least not on cray with --enable-picky option: -- make all -j 8 result_stderr --- keyval_lex.c: In function 'yy_get_next_buffer': keyval_lex.c:751:18: warning: comparison between signed and unsigned integer expressions [-Wsign-compare] for (