[OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
As mentioned in the weekly conference call, I am seeing some strange errors when using the openib BTL. I have narrowed down the changeset that broke things to the ORTE async code. https://svn.open-mpi.org/trac/ompi/changeset/29058 (and https://svn.open-mpi.org/trac/ompi/changeset/29061 which

Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-09-03 Thread Jeff Squyres (jsquyres)
Hmm. Are you building Open MPI in a special way? I ask because I'm unable to replicate the issue -- I've run your test (and a C equivalent) a few hundred times now: [jsquyres@savbu-usnic-a mpi]$ which gfortran /usr/bin/gfortran [jsquyres@savbu-usnic-a mpi]$ gfortran --version GNU Fortran

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Ralph Castain
Are you all the way up to the current trunk? There have been a few typo fixes since the original commit. I'm not familiar with the OOB connect code in openib. The OOB itself isn't using free list, so I suspect it is something up in the OOB connect code itself. I'll take a look and see if someth

Re: [OMPI devel] NO LT_DLADVISE - CANNOT LOAD LIBOMPI JAVA BINDINGS

2013-09-03 Thread Jeff Squyres (jsquyres)
On Sep 2, 2013, at 1:53 AM, Bibrak Qamar wrote: > Yes you are right, it does distribute the ltdl in the source library. But > isn't it installed by default when OpenMPI is installed? It certainly should. But it's part of libopen-pal.so -- not a standalone libltdl.so. If you're running your o

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
Yes, it fails on the current trunk (r29112). That is what started me on the journey to figure out when things went wrong. It was working up until r29058. From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain Sent: Tuesday, September 03, 2013 2:49 PM To: Open MPI Developers

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Ralph Castain
Dang - I just finished running it on odin without a problem. Are you seeing this with a debug or optimized build? On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart wrote: > Yes, it fails on the current trunk (r29112). That is what started me on the > journey to figure out when things went wrong.

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Ralph Castain
Also, send me your test code - maybe that is required to trigger it On Sep 3, 2013, at 12:19 PM, Ralph Castain wrote: > Dang - I just finished running it on odin without a problem. Are you seeing > this with a debug or optimized build? > > > On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart wrote

Re: [OMPI devel] GNU Automake 1.14 released

2013-09-03 Thread Jeff Squyres (jsquyres)
How about sym linking the source file? Then you would only need a single Makefile.am; you can use different flags depending on which source file you compile. While somewhat gross, it's not totally disgusting, and it should work to the same effect...? On Aug 30, 2013, at 4:16 AM, Bert Wesarg

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
I am running a debug build. Here is my configure line: ../configure --enable-debug --enable-shared --disable-static --prefix=/home/rolf/ompi-trunk-29061/64 --with- wrapper-ldflags='-Wl,-rpath,${prefix}/lib' --disable-vt --enable-orterun-prefix-by-default -disable-io-romio --enable-picky The

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Ralph Castain
Sigh - I cannot get it to fail. I've tried up to np=16 without getting a single hiccup. Try a fresh checkout - let's make sure you don't have some old cruft laying around. On Sep 3, 2013, at 12:26 PM, Rolf vandeVaart wrote: > I am running a debug build. Here is my configure line: > > ../co

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
I just retried and I still get errors with the latest trunk. (29112). If I back up to r29057, then everything is fine. In addition, I can reproduce this on two different clusters. Can you try running the entire intel test suite and see if that works? Maybe a different test will fail for you.

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
Correction: That line below should be: gmake run FILE=p2p_c From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart Sent: Tuesday, September 03, 2013 4:50 PM To: Open MPI Developers Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes I just retried and I sti

Re: [OMPI devel] openib BTL problems with ORTE async changes

2013-09-03 Thread Rolf vandeVaart
Between a few off-list emails, Ralph was able to reproduce this problem on odin when he forced the use of the oob connection code in the openib BTL. I have created a ticket to track this issue. Not sure what we will do with this issue. https://svn.open-mpi.org/trac/ompi/ticket/3746 From: deve

Re: [OMPI devel] GNU Automake 1.14 released

2013-09-03 Thread Jeff Squyres (jsquyres)
On Sep 3, 2013, at 6:45 PM, Fabrício Zimmerer Murta wrote: > I think autotools has a concept of disallowing symlinks as it seems symlinks > can't be done in a portable way, and the goal of autotools is making projects > portable. > > Well, if the autotools user feels like using symlinks, then

Re: [OMPI devel] GNU Automake 1.14 released

2013-09-03 Thread Ralph Castain
I still don't see an issue with just detecting the version of automake being used, and setting a conditional that indicates whether or not to use explicitly include the subdir. Seems like a pretty trivial solution. On Sep 3, 2013, at 3:49 PM, "Jeff Squyres (jsquyres)" wrote: > On Sep 3, 2013

Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 03/09/13 10:56, Ralph Castain wrote: > Yeah - --with-pmi= Actually I found that just --with-pmi=/usr/local/slurm/latest worked. :-) I've got some initial numbers for 64 cores, as I mentioned the system I found this on initially is so busy at the

Re: [OMPI devel] Possible OMPI 1.6.5 bug? SEGV in malloc.c

2013-09-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/13 04:47, Jeff Squyres (jsquyres) wrote: > Hmm. Are you building Open MPI in a special way? I ask because I'm > unable to replicate the issue -- I've run your test (and a C > equivalent) a few hundred times now: I don't think we do anythin

Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-03 Thread Ralph Castain
Your code is obviously doing something much more than just launching and wiring up, so it is difficult to assess the difference in speed between 1.6.5 and 1.7.3 - my guess is that it has to do with changes in the MPI transport layer and nothing to do with PMI or not. Likewise, I can't imagine a

Re: [OMPI devel] Open-MPI build of NAMD launched from srun over 20% slowed than with mpirun

2013-09-03 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 04/09/13 11:29, Ralph Castain wrote: > Your code is obviously doing something much more than just > launching and wiring up, so it is difficult to assess the > difference in speed between 1.6.5 and 1.7.3 - my guess is that it > has to do with chang