Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Jeff Squyres (jsquyres)
Ignore me; I read your email wrong. You have "btl = ^usnic" commented out, and a line above it saying "if you need PSM2, then uncomment these...". Makes perfect sense. Sorry for the noise. > On Sep 4, 2015, at 12:00 PM, Jeff Squyres (jsquyres) > wrote: > > Michael: Wait, why are you disabl

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Jeff Squyres (jsquyres)
Michael: Wait, why are you disabling usnic? Please don't penalize usNIC because of Intel's PSM issues. > On Sep 4, 2015, at 9:29 AM, Ralph Castain wrote: > > Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that true? > > > >> On Sep 4, 2015, at 5:57 AM, Michal Schmidt

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Jeff Squyres (jsquyres)
om: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph >>> Castain >>> Sent: Thursday, September 3, 2015 4:44 PM >>> To: Open MPI Developers >>> Subject: Re: [OMPI devel] 1.10.0 issue >>> >>> Yes, it actually is rather easy to do. I ca

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Ralph Castain
sage- >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph >> Castain >> Sent: Thursday, September 3, 2015 4:44 PM >> To: Open MPI Developers >> Subject: Re: [OMPI devel] 1.10.0 issue >> >> Yes, it actually is rather easy to do. I can check

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Friedley, Andrew
M2 MTL. Andrew > -Original Message- > From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph > Castain > Sent: Thursday, September 3, 2015 4:44 PM > To: Open MPI Developers > Subject: Re: [OMPI devel] 1.10.0 issue > > Yes, it actually is rather easy to d

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Ralph Castain
A…thanks! > On Sep 4, 2015, at 6:52 AM, Michal Schmidt wrote: > > On 09/04/2015 03:29 PM, Ralph Castain wrote: >> Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that >> true? > > Indirectly, via libfabric. > > Michal > > > __

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Michal Schmidt
On 09/04/2015 03:29 PM, Ralph Castain wrote: > Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that true? Indirectly, via libfabric. Michal

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Ralph Castain
Umm…why would USNIC depend on libpsm_infinipath?? Jeff or Dave - is that true? > On Sep 4, 2015, at 5:57 AM, Michal Schmidt wrote: > > On 09/03/2015 03:47 PM, Ralph Castain wrote: >> I guess I didn’t make it clear in my prior comment, so let me try >> again. I understand about dlopen and the f

Re: [OMPI devel] 1.10.0 issue

2015-09-04 Thread Michal Schmidt
On 09/03/2015 03:47 PM, Ralph Castain wrote: > I guess I didn’t make it clear in my prior comment, so let me try > again. I understand about dlopen and the fix that George proposed - > we had internally discussed this as well. However, the questions that > raises are: > > 1. how does the distro (M

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Ralph Castain
Yes, it actually is rather easy to do. I can check, but I think that should happen now (unless psm2 was set to auto-build if the lib was detected). Regardless, we can always have RH et al simply build with —enable-mca-no-build=mtl-psm2 and that will solve the problem. Please keep us posted - an

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Friedley, Andrew
Hi Ralph & crew, I'm representing the Intel PSM team to Open MPI. They're aware of the problem and have seen the comments on both this thread and in OFI, and are working on solving the issue within PSM2. Current estimate is that it will take 3-4 weeks. If it comes to removing the PSM2 MTL fro

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Howard Pritchard
I vote for Ralph's proposal. 2015-09-03 10:05 GMT-06:00 Ralph Castain : > As we discussed on the phone, I prefer the bullet #3 approach - ask RedHat > to build/distribute 1.10.0 without PSM2 support, and let Intel provide a > PSM2-enabled variant via their current proprietary distribution channel

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Ralph Castain
As we discussed on the phone, I prefer the bullet #3 approach - ask RedHat to build/distribute 1.10.0 without PSM2 support, and let Intel provide a PSM2-enabled variant via their current proprietary distribution channel until they can provide a “clean” solution to the community. If that hasn’t

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Jeff Squyres (jsquyres)
Ralph and I just chatted about this on the phone. I think I understand his position better now. Just to be clear/put some context in this conversation: 1. PSM (aka "PSM1") supports TrueScale Intel networks 2. PSM2 supports OmniScale Intel networks -- The following three solutions are more

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
Ralph, if I correctly read between the lines of your second point, omnipath (PSM2) is working out of the box. I am not sure this is the case, and/or my extrapolation might be incorrect. if I understood correctly, psm2 is a new feature. from a distro point of view, that could be a new package (kno

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread hppritcha
hi Jeff to answer your question I too find the PSM 1/2 weird and a real mess. Back to IB verbs? Howard Von meinem iPhone gesendet > Am 03.09.2015 um 06:55 schrieb Jeff Squyres (jsquyres) : > > I agree with what George says. > > AFAIK, Red Hat builds Open MPI support for dlopen, so the config

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Jeff Squyres (jsquyres)
On Sep 3, 2015, at 9:30 AM, Gilles Gouaillardet wrote: > > on second thought, wouldn't it be better to simple disable both PSM and PSM2 > in openmpi, > and let libfabric handle these conflicts ? There's two reasons: 1. Intel still wants to use their PSM and PSM2 MTLs. 2. The publicly-released

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Ralph Castain
I guess I didn’t make it clear in my prior comment, so let me try again. I understand about dlopen and the fix that George proposed - we had internally discussed this as well. However, the questions that raises are: 1. how does the distro (Michal) decide which PSM module to disable by default i

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
Jeff, on second thought, wouldn't it be better to simple disable both PSM and PSM2 in openmpi, and let libfabric handle these conflicts ? does that make any sense ? Cheers, Gilles On Thursday, September 3, 2015, Jeff Squyres (jsquyres) wrote: > I agree with what George says. > > AFAIK, Red Ha

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Jeff Squyres (jsquyres)
I agree with what George says. AFAIK, Red Hat builds Open MPI support for dlopen, so the config file option is probably suitable. However, I have to admit that I resent the fact that PSM's poor upgrade path design is forcing both the Open MPI and libfabric communities to have similar confusing

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
Michael, if a solution with two packages is acceptable, then an other and simpler option is to configure openmpi for PSM with --without-psm2, and openmpi for PSM2 with --without-psm this is safe for --disable-dlopen or --enable-static, and you do not need to tweak the conf files Cheers, Gilles

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread George Bosilca
Hi Michael, I might have missed some context when proposing this solution. As Gilles suggested if you build Open MPI without support for dlopen (configure option --disable-dlopen) this simple solution will not work because the symbol conflict issue is generated deep inside the constructors of the

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Michal Schmidt
[I apologize for not threading the email properly. I was not subscribed before and found the conversation in the web archive.] Hello, I am the one who discovered the PSM vs. PSM2 library conflict and proposed the temporary workaround of having two builds of the openmpi package. George Bosilca wr

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread George Bosilca
On Thu, Sep 3, 2015 at 12:49 AM, Ralph Castain wrote: > George, I think you misunderstand the difference between the two modules. > PSM supports one type of fabric, and PSM2 supports a different one. They > are not interchangeable. > Ralph, what these two modules do is irrelevant. My point is th

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Ralph Castain
George, I think you misunderstand the difference between the two modules. PSM supports one type of fabric, and PSM2 supports a different one. They are not interchangeable. I agree with your second point. If you have a way of resolving it, I would welcome hearing it. So far, the problems have be

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
George, about your third point : some libraries does stuff in the constructors, so "mtl = ^psm" might also not work if OMPI was configure'd with --disable-dlopen. as far as i know, --disable-dlopen is quite popular (and --disable-shared --enable-static is not so much) Cheers, Gilles On 9/3/

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread George Bosilca
I might have missed something here but: 1. I bet that, and I'm certainly using a lower bound here, 99.9% of our users will not even notice the issue between PSM and PSM2. 2. If there is anything that might negatively impact us as a community is the recurrent screwed-up with our own releases. For

Re: [OMPI devel] 1.10.0 issue

2015-09-03 Thread Gilles Gouaillardet
Ralph, on one hand, i do not have a strong opinion about keeping PSM2 i the v1.10 series on the other hand, i feel confused by this explanation ... if PSM2 is simply removed, only one version of ompi can be released, but there is no way to support PSM2 at all. how is this better than giving th

Re: [OMPI devel] 1.10.0 issue

2015-09-02 Thread Ralph Castain
I’m afraid that won’t solve the problem - the distro will still feel the need to release -two- versions of OMPI, one with PSM and one with PSM2. Ordinarily, I wouldn’t care - but this creates user confusion and reflects on us as a community. > On Sep 2, 2015, at 6:50 PM, Gilles Gouaillardet w

Re: [OMPI devel] 1.10.0 issue

2015-09-02 Thread Gilles Gouaillardet
Ralph, what about automatically *not* building PSM2 if PSM is built and PSM2 is not explicitly required ? /* in order to be future proof, we could even do that only if we detect a symbol conflict */ we could abort if ompi is configure'd with both --with-psm and --with-psm2, or simply do nothin