Re: [OMPI devel] 1.10.0rc5 - mx problem when compiled statically

2015-08-24 Thread Paul Hargrove
Ralph, I will try that and expect it will fix exactly those two tests. However, that leaves 11 other undefined references in WRAPPER_EXTRA_LDFLAGS. Five of those I already know will cause test failures as shown in my previous email. -Paul On Sun, Aug 23, 2015 at 8:50 PM, Ralph Castain wrote:

Re: [OMPI devel] 1.10.0rc5 - mx problem when compiled statically

2015-08-24 Thread Paul Hargrove
Ralph, As I expected your fixes to ompi/debuggers/Makefile.am only fixed the tests in that directory. I am still left with make[4]: Entering directory `/home/phargrov/OMPI/openmpi-1.10.0rc5-linux-x86-mx-static/BLD/test/datatype' PASS: opal_datatype_test FAIL: checksum FAIL: position FAIL: positio

Re: [OMPI devel] 1.10.0rc5 - mx problem when compiled statically

2015-08-24 Thread Ralph Castain
Indeed, you are absolutely correct! I will fix all of them, but wanted first to ensure that was the correct fix. Thanks Paul! > On Aug 23, 2015, at 10:44 PM, Paul Hargrove wrote: > > Ralph, > > As I expected your fixes to ompi/debuggers/Makefile.am only fixed the tests > in that directory. >

[OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
Folks, I recently installed libnl3-devel rpm on my centos 7 box, reconfigured and recompiled ompi, and ompi_info now crashes. it seems the root cause is an obscure conflict between libnl and libnl3. libnl is indirectly required by the common_verbs mac (OFED libraries do need it) and libnl3 is req

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Jeff Squyres (jsquyres)
It is definitely true that if both libnl v1 and libnl v3 (also known as "libnl3", even though libnl v1 is known as "libnl") are present in the same process, Random Bad Things will happen. This is due to unfortunate choices that the netlink library authors and/or packagers made. >From what I ha

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
iirc, librdmacm uses libnl I am not sure if handling this at run time is even possible why not handle this at configure time ? e.g. if a component known to use libnl is built, then make sure no component uses libnl3 On Monday, August 24, 2015, Jeff Squyres (jsquyres) wrote: > It is definitely

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
fwiw, in my environment, libnl is loaded before libnl3 the crash occurs in libnl3 initializer, which is invoked when dlopen'ing mca_reachable_netlink.so it is very strange since some initialized static structs (same name, different type and value in both libraries) are incorrectly initialized (or a

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Gilles Gouaillardet
a first step could be adding a --disable-libnl3 option to configure, which means components should not even try to use libnl3 makes sense ? On Monday, August 24, 2015, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote: > iirc, librdmacm uses libnl > > I am not sure if handling this at r

Re: [OMPI devel] 1.10.0rc5 - mx problem when compiled statically

2015-08-24 Thread Ralph Castain
Okay - one (hopefully) very last time!!! rc6 has now been posted, including all these changes > On Aug 24, 2015, at 1:05 AM, Ralph Castain wrote: > > Indeed, you are absolutely correct! I will fix all of them, but wanted first > to ensure that was the correct fix. > > Thanks Paul! > >> On

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Jeff Squyres (jsquyres)
On Aug 24, 2015, at 9:31 AM, Gilles Gouaillardet wrote: > > iirc, librdmacm uses libnl > > I am not sure if handling this at run time is even possible > > why not handle this at configure time ? > e.g. if a component known to use libnl is built, then make sure no component > uses libnl3 How

Re: [OMPI devel] reachable_netlink mca, libnl and libnl3

2015-08-24 Thread Jeff Squyres (jsquyres)
This is a losing battle. We can't keep an up-to-date table in our configury of what downstream packages were compiled with what versions of libnl, not only because it would quickly become out of date, but also because the downstream package may be variable (e.g., libfabric, as I cited in http:

Re: [OMPI devel] 1.10.0rc5 - mx problem when compiled statically

2015-08-24 Thread Paul Hargrove
My testing is underway... Will report anything of significance. -Paul On Mon, Aug 24, 2015 at 6:58 AM, Ralph Castain wrote: > Okay - one (hopefully) very last time!!! > > rc6 has now been posted, including all these changes > > > > On Aug 24, 2015, at 1:05 AM, Ralph Castain wrote: > > Indeed,

[OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Paul Hargrove
Sorry to yet again be the bearer of bad news. I am now configuring with --prefix=[...] --enable-debug --with-libfabric=/opt/libfabric-1.0.0 --with-mx=/opt/mx2g --disable-dlopen This is like the previous configuration that caused problems, but with "--disable-dlopen" instead of "--enable-static -

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Ralph Castain
You know, if it wren’t for the impact it would have on our users, I’d almost say that if Mellanox doesn’t care enough to ensure this works, then maybe we should just release and see if someone actually does care? I’ll try again later today if/when I have time. Otherwise, I’ll raise it at tomorr

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Paul Hargrove
Ralph mx = Myricom (not Mellanox, which is mxm). So, there is probably nobody to fix anything specific to the MX support. Thus if this newly reported problem is (as I am going to guess) in config/ompi_check_mx.m4 then it may go unfixed. You say you and I are the only ones to care, and I think we

[OMPI devel] 1.10.0rc6 - Solaris "make check" problem (regression vs. rc5)

2015-08-24 Thread Paul Hargrove
This is from testing the Studio compilers on a Solaris-11.1/amd64 platform, in a configuration that passed my testing of rc5. I have configured with --prefix=[...] --enable-debug CC=cc CXX=CC FC=f90 --with-verbs \ CXXFLAGS='-library=stlport4' --with-wrapper-cxxflags='-library=stlport4' Note that

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Ralph Castain
Ah, my bad - thanks for correcting me! I’ll have to ask folks tomorrow if we care about Myricom at this point. > On Aug 24, 2015, at 10:52 AM, Paul Hargrove wrote: > > Ralph > > mx = Myricom (not Mellanox, which is mxm). > So, there is probably nobody to fix anything specific to the MX suppo

Re: [OMPI devel] 1.10.0rc6 - Solaris "make check" problem (regression vs. rc5)

2015-08-24 Thread Ralph Castain
So you think this will resolve the problem: diff --git a/ompi/debuggers/Makefile.am b/ompi/debuggers/Makefile.am index 93a3046..069c3e6 100644 --- a/ompi/debuggers/Makefile.am +++ b/ompi/debuggers/Makefile.am @@ -44,14 +44,14 @@ headers = \ # Simple checks to ensure that the DSOs are functional

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Paul Hargrove
On Mon, Aug 24, 2015 at 10:52 AM, Paul Hargrove wrote: > Thus if this newly reported problem is (as I am going to guess) > in config/ompi_check_mx.m4 then it may go unfixed. > You say you and I are the only ones to care, and I think we both care for > reasons related to software quality rather th

Re: [OMPI devel] 1.10.0rc6 - Solaris "make check" problem (regression vs. rc5)

2015-08-24 Thread Paul Hargrove
Ralph, Yes, I suspect that would resolve the problem. However, based on my conclusions presented in an email sent a few minutes ago (and expanded upon below), I think you should either *revert* or *remove* all of those *_LDFLAGS settings. These variables were *empty* prior to our work that lead

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Jeff Squyres (jsquyres)
FWIW, we have had verbal agreement in the past that the v1.8 series was the last one to contain MX support. I think it would be fine for all MX-related components to disappear from v1.10. Don't forget that Myricom as an HPC company no longer exists. > On Aug 24, 2015, at 2:34 PM, Paul Hargrov

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Nathan Hjelm
+1 On Mon, Aug 24, 2015 at 07:08:02PM +, Jeff Squyres (jsquyres) wrote: > FWIW, we have had verbal agreement in the past that the v1.8 series was the > last one to contain MX support. I think it would be fine for all MX-related > components to disappear from v1.10. > > Don't forget that M

[OMPI devel] esslingen MTT?

2015-08-24 Thread Jeff Squyres (jsquyres)
Who runs the esslingen MTT? You're getting some build failures on master that I don't understand: - make[3]: Entering directory '/home/adrian/mtt-scratch/mpi-install/FDvh/src/openmpi-dev-2350-geb25c00/ompi/mpi/fortran/mpif-h/profile' GENERATE psizeof_f.f90 FC psizeof_f.lo Usage: /h

[OMPI devel] v1.10.0rc7

2015-08-24 Thread Ralph Castain
Yet another step in the apparently never-ending quest to release v1.10.0… http://www.open-mpi.org/software/ompi/v1.10/ Please check it out Ralph

Re: [OMPI devel] 1.10.0rc6 - slightly different mx problem

2015-08-24 Thread Christopher Samuel
On 25/08/15 05:08, Jeff Squyres (jsquyres) wrote: > FWIW, we have had verbal agreement in the past that the v1.8 series > was the last one to contain MX support. I think it would be fine for > all MX-related components to disappear from v1.10. > > Don't forget that Myricom as an HPC company no l