Just for FYI: I believe Nathan misspoke. The new capability is in 1.8.4, which I hope to release next Friday (Nov 7th)
> On Oct 30, 2014, at 4:24 PM, Gus Correa <g...@ldeo.columbia.edu> wrote: > > Hi Nathan > > Thank you very much for addressing this problem. > > I read your notes on Jeff's blog about vader, > and that clarified many things that were obscure to me > when I first started this thread > whining that knem was not working in OMPI 1.8.3. > Thank you also for writing that blog post, > and for sending the link to it. > That was very helpful indeed. > > As your closing comments on the blog post point out, > and your IMB benchmark graphs of pingpong/latency & > sendrecv/bandwidth show, > vader+xpmem outperforms the other combinations > of btl+memory_copy_mechanism of intra-node communication. > > For the benefit of pedestrian OpenMPI users like me: > > 1) What is the status of xpmem in the Linux world at this point? > [Proprietary (SGI?) / open source, part of the Linux kernel (which), > part of standard distributions (which) ?] > > 2) Any recommendation for the values of the > various vader btl parameters? > [There are 12 of them in OMPI 1.8.3! > That is real challenge to get right.] > > Which values did you use in your benchmarks? > Defaults? > Other? > > In particular, is there an optimal value for the eager/rendevous threshold > value? (btl_vader_eager_limit, default=4kB) > [The INRIA web site suggests 32kB for the sm+knem counterpart > (btl_sm_eager_limit, default=4kB).] > > 3) Did I understand it right, that the upcoming OpenMPI 1.8.5 > can be configured with more than one memory copy mechanism altogether > (e.g. --with-knem and --with-cma and --with-xpmem), > then select one of them at runtime with the btl_vader_single_copy_mechanism > parameter? > Or must OMPI be configured with only one memory copy mechanism? > > Many thanks, > Gus Correa > > > On 10/30/2014 05:44 PM, Nathan Hjelm wrote: >> I want to close the loop on this issue. 1.8.5 will address it in several >> ways: >> >> - knem support in btl/sm has been fixed. A sanity check was disabling >> knem during component registration. I wrote the sanity check before >> the 1.7 release and didn't intend this side-effect. >> >> - vader now supports xpmem, cma, and knem. The best available >> single-copy mechanism will be used. If multiple single-copy >> mechanisms are available you can select which one you want to use are >> runtime. >> >> More about the vader btl can be found here: >> http://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/ >> >> -Nathan Hjelm >> HPC-5, LANL >> >> On Fri, Oct 17, 2014 at 01:02:23PM -0700, Ralph Castain wrote: >>> On Oct 17, 2014, at 12:06 PM, Gus Correa <g...@ldeo.columbia.edu> >>> wrote: >>> Hi Jeff >>> >>> Many thanks for looking into this and filing a bug report at 11:16PM! >>> >>> Thanks to Aurelien, Ralph and Nathan for their help and clarifications >>> also. >>> >>> ** >>> >>> Related suggestion: >>> >>> Add a note to the FAQ explaining that in OMPI 1.8 >>> the new (default) btl is vader (and what it is). >>> >>> It was a real surprise to me. >>> If Aurelien Bouteiller didn't tell me about vader, >>> I might have never realized it even existed. >>> >>> That could be part of one of the already existent FAQs >>> explaining how to select the btl. >>> >>> ** >>> >>> Doubts (btl in OMPI 1.8): >>> >>> I still don't understand clearly the meaning and scope of vader >>> being a "default btl". >>> >>> We mean that it has a higher priority than the other shared memory >>> implementation, and so it will be used for intra-node messaging by >>> default. >>> >>> Which is the scope of this default: intra-node btl only perhaps? >>> >>> Yes - strictly intra-node >>> >>> Was there a default btl before vader, and which? >>> >>> The "sm" btl was the default shared memory transport before vader >>> >>> Is vader the intra-node default only (i.e. replaces sm by default), >>> >>> Yes >>> >>> or does it somehow extend beyond node boundaries, and replaces (or >>> brings in) network btls (openib,tcp,etc) ? >>> >>> Nope - just intra-node >>> >>> If I am running on several nodes, and want to use openib, not tcp, >>> and, say, use vader, what is the right syntax? >>> >>> * nothing (OMPI will figure it out ... but what if you have >>> IB,Ethernet,Myrinet,OpenGM, altogether?) >>> >>> If you have higher-speed connections, we will pick the fastest for >>> inter-node messaging as the "default" since we expect you would want the >>> fastest possible transport. >>> >>> * -mca btl openib (and vader will come along automatically) >>> >>> Among the ones you show, this would indeed be the likely choices (openib >>> and vader) >>> >>> * -mca btl openib,self (and vader will come along automatically) >>> >>> The "self" btl is *always* active as the loopback transport >>> >>> * -mca btl openib,self,vader (because vader is default only for 1-node >>> jobs) >>> * something else (or several alternatives) >>> >>> Whatever happened to the "self" btl in this new context? >>> Gone? Still there? >>> >>> Many thanks, >>> Gus Correa >>> >>> On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote: >>> >>> On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> >>> wrote: >>> >>> and on the MCA parameter file: >>> >>> btl_sm_use_knem = 1 >>> >>> I think the logic enforcing this MCA param got broken when we >>> revamped >>> the MCA param system. :-( >>> >>> I am scratching my head to understand why a parameter with such a >>> suggestive name ("btl_sm_have_knem_support"), >>> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro, >>> somehow vanished from ompi_info in OMPI 1.8.3. >>> >>> It looks like this MCA param was also dropped when we revamped the >>> MCA >>> system. Doh! :-( >>> >>> There's some deep mojo going on that is somehow causing knem to not >>> be >>> used; I'm too tired to understand the logic right now. I just opened >>> https://github.com/open-mpi/ompi/issues/239 to track this issue -- >>> feel free to subscribe to the issue to get updates. >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this >>> post: http://www.open-mpi.org/community/lists/users/2014/10/25532.php >> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2014/10/25534.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25647.php >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25649.php