As mentioned in the weekly conference call, I am seeing some strange errors
when using the openib BTL. I have narrowed down the changeset that broke
things to the ORTE async code.
https://svn.open-mpi.org/trac/ompi/changeset/29058 (and
https://svn.open-mpi.org/trac/ompi/changeset/29061 which
Hmm. Are you building Open MPI in a special way? I ask because I'm unable to
replicate the issue -- I've run your test (and a C equivalent) a few hundred
times now:
[jsquyres@savbu-usnic-a mpi]$ which gfortran
/usr/bin/gfortran
[jsquyres@savbu-usnic-a mpi]$ gfortran --version
GNU Fortran
Are you all the way up to the current trunk? There have been a few typo fixes
since the original commit.
I'm not familiar with the OOB connect code in openib. The OOB itself isn't
using free list, so I suspect it is something up in the OOB connect code
itself. I'll take a look and see if someth
On Sep 2, 2013, at 1:53 AM, Bibrak Qamar wrote:
> Yes you are right, it does distribute the ltdl in the source library. But
> isn't it installed by default when OpenMPI is installed?
It certainly should. But it's part of libopen-pal.so -- not a standalone
libltdl.so.
If you're running your o
Yes, it fails on the current trunk (r29112). That is what started me on the
journey to figure out when things went wrong. It was working up until r29058.
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
Sent: Tuesday, September 03, 2013 2:49 PM
To: Open MPI Developers
Dang - I just finished running it on odin without a problem. Are you seeing
this with a debug or optimized build?
On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart wrote:
> Yes, it fails on the current trunk (r29112). That is what started me on the
> journey to figure out when things went wrong.
Also, send me your test code - maybe that is required to trigger it
On Sep 3, 2013, at 12:19 PM, Ralph Castain wrote:
> Dang - I just finished running it on odin without a problem. Are you seeing
> this with a debug or optimized build?
>
>
> On Sep 3, 2013, at 12:16 PM, Rolf vandeVaart wrote
How about sym linking the source file? Then you would only need a single
Makefile.am; you can use different flags depending on which source file you
compile.
While somewhat gross, it's not totally disgusting, and it should work to the
same effect...?
On Aug 30, 2013, at 4:16 AM, Bert Wesarg
I am running a debug build. Here is my configure line:
../configure --enable-debug --enable-shared --disable-static
--prefix=/home/rolf/ompi-trunk-29061/64 --with-
wrapper-ldflags='-Wl,-rpath,${prefix}/lib' --disable-vt
--enable-orterun-prefix-by-default -disable-io-romio --enable-picky
The
Sigh - I cannot get it to fail. I've tried up to np=16 without getting a single
hiccup.
Try a fresh checkout - let's make sure you don't have some old cruft laying
around.
On Sep 3, 2013, at 12:26 PM, Rolf vandeVaart wrote:
> I am running a debug build. Here is my configure line:
>
> ../co
I just retried and I still get errors with the latest trunk. (29112). If I
back up to r29057, then everything is fine. In addition, I can reproduce this
on two different clusters.
Can you try running the entire intel test suite and see if that works? Maybe a
different test will fail for you.
Correction: That line below should be:
gmake run FILE=p2p_c
From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Rolf vandeVaart
Sent: Tuesday, September 03, 2013 4:50 PM
To: Open MPI Developers
Subject: Re: [OMPI devel] openib BTL problems with ORTE async changes
I just retried and I sti
Between a few off-list emails, Ralph was able to reproduce this problem on odin
when he forced the use of the oob connection code in the openib BTL.
I have created a ticket to track this issue. Not sure what we will do with
this issue.
https://svn.open-mpi.org/trac/ompi/ticket/3746
From: deve
On Sep 3, 2013, at 6:45 PM, FabrÃcio Zimmerer Murta
wrote:
> I think autotools has a concept of disallowing symlinks as it seems symlinks
> can't be done in a portable way, and the goal of autotools is making projects
> portable.
>
> Well, if the autotools user feels like using symlinks, then
I still don't see an issue with just detecting the version of automake being
used, and setting a conditional that indicates whether or not to use explicitly
include the subdir. Seems like a pretty trivial solution.
On Sep 3, 2013, at 3:49 PM, "Jeff Squyres (jsquyres)"
wrote:
> On Sep 3, 2013
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 03/09/13 10:56, Ralph Castain wrote:
> Yeah - --with-pmi=
Actually I found that just --with-pmi=/usr/local/slurm/latest worked. :-)
I've got some initial numbers for 64 cores, as I mentioned the system
I found this on initially is so busy at the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 04/09/13 04:47, Jeff Squyres (jsquyres) wrote:
> Hmm. Are you building Open MPI in a special way? I ask because I'm
> unable to replicate the issue -- I've run your test (and a C
> equivalent) a few hundred times now:
I don't think we do anythin
Your code is obviously doing something much more than just launching and wiring
up, so it is difficult to assess the difference in speed between 1.6.5 and
1.7.3 - my guess is that it has to do with changes in the MPI transport layer
and nothing to do with PMI or not.
Likewise, I can't imagine a
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 04/09/13 11:29, Ralph Castain wrote:
> Your code is obviously doing something much more than just
> launching and wiring up, so it is difficult to assess the
> difference in speed between 1.6.5 and 1.7.3 - my guess is that it
> has to do with chang
19 matches
Mail list logo