Just for FYI: I believe Nathan misspoke. The new capability is in 1.8.4, which 
I hope to release next Friday (Nov 7th)

> On Oct 30, 2014, at 4:24 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> 
> Hi Nathan
> 
> Thank you very much for addressing this problem.
> 
> I read your notes on Jeff's blog about vader,
> and that clarified many things that were obscure to me
> when I first started this thread
> whining that knem was not working in OMPI 1.8.3.
> Thank you also for writing that blog post,
> and for sending the link to it.
> That was very helpful indeed.
> 
> As your closing comments on the blog post point out,
> and your IMB benchmark graphs of pingpong/latency &
> sendrecv/bandwidth show,
> vader+xpmem outperforms the other combinations
> of btl+memory_copy_mechanism of intra-node communication.
> 
> For the benefit of pedestrian OpenMPI users like me:
> 
> 1) What is the status of xpmem in the Linux world at this point?
> [Proprietary (SGI?) / open source, part of the Linux kernel (which),
> part of standard distributions (which) ?]
> 
> 2) Any recommendation for the values of the
> various vader btl parameters?
> [There are 12 of them in OMPI 1.8.3!
> That is real challenge to get right.]
> 
> Which values did you use in your benchmarks?
> Defaults?
> Other?
> 
> In particular, is there an optimal value for the eager/rendevous threshold 
> value? (btl_vader_eager_limit, default=4kB)
> [The INRIA web site suggests 32kB for the sm+knem counterpart 
> (btl_sm_eager_limit, default=4kB).]
> 
> 3) Did I understand it right, that the upcoming OpenMPI 1.8.5
> can be configured with more than one memory copy mechanism altogether
> (e.g. --with-knem and --with-cma and --with-xpmem),
> then select one of them at runtime with the btl_vader_single_copy_mechanism 
> parameter?
> Or must OMPI be configured with only one memory copy mechanism?
> 
> Many thanks,
> Gus Correa
> 
> 
> On 10/30/2014 05:44 PM, Nathan Hjelm wrote:
>> I want to close the loop on this issue. 1.8.5 will address it in several
>> ways:
>> 
>>  - knem support in btl/sm has been fixed. A sanity check was disabling
>>    knem during component registration. I wrote the sanity check before
>>    the 1.7 release and didn't intend this side-effect.
>> 
>>  - vader now supports xpmem, cma, and knem. The best available
>>    single-copy mechanism will be used. If multiple single-copy
>>    mechanisms are available you can select which one you want to use are
>>    runtime.
>> 
>> More about the vader btl can be found here:
>> http://blogs.cisco.com/performance/the-vader-shared-memory-transport-in-open-mpi-now-featuring-3-flavors-of-zero-copy/
>> 
>> -Nathan Hjelm
>> HPC-5, LANL
>> 
>> On Fri, Oct 17, 2014 at 01:02:23PM -0700, Ralph Castain wrote:
>>>      On Oct 17, 2014, at 12:06 PM, Gus Correa <g...@ldeo.columbia.edu> 
>>> wrote:
>>>      Hi Jeff
>>> 
>>>      Many thanks for looking into this and filing a bug report at 11:16PM!
>>> 
>>>      Thanks to Aurelien, Ralph and Nathan for their help and clarifications
>>>      also.
>>> 
>>>      **
>>> 
>>>      Related suggestion:
>>> 
>>>      Add a note to the FAQ explaining that in OMPI 1.8
>>>      the new (default) btl is vader (and what it is).
>>> 
>>>      It was a real surprise to me.
>>>      If Aurelien Bouteiller didn't tell me about vader,
>>>      I might have never realized it even existed.
>>> 
>>>      That could be part of one of the already existent FAQs
>>>      explaining how to select the btl.
>>> 
>>>      **
>>> 
>>>      Doubts (btl in OMPI 1.8):
>>> 
>>>      I still don't understand clearly the meaning and scope of vader
>>>      being a "default btl".
>>> 
>>>    We mean that it has a higher priority than the other shared memory
>>>    implementation, and so it will be used for intra-node messaging by
>>>    default.
>>> 
>>>      Which is the scope of this default: intra-node btl only perhaps?
>>> 
>>>    Yes - strictly intra-node
>>> 
>>>      Was there a default btl before vader, and which?
>>> 
>>>    The "sm" btl was the default shared memory transport before vader
>>> 
>>>      Is vader the intra-node default only (i.e. replaces sm  by default),
>>> 
>>>    Yes
>>> 
>>>      or does it somehow extend beyond node boundaries, and replaces (or
>>>      brings in) network btls (openib,tcp,etc) ?
>>> 
>>>    Nope - just intra-node
>>> 
>>>      If I am running on several nodes, and want to use openib, not tcp,
>>>      and, say, use vader, what is the right syntax?
>>> 
>>>      * nothing (OMPI will figure it out ... but what if you have
>>>      IB,Ethernet,Myrinet,OpenGM, altogether?)
>>> 
>>>    If you have higher-speed connections, we will pick the fastest for
>>>    inter-node messaging as the "default" since we expect you would want the
>>>    fastest possible transport.
>>> 
>>>      * -mca btl openib (and vader will come along automatically)
>>> 
>>>    Among the ones you show, this would indeed be the likely choices (openib
>>>    and vader)
>>> 
>>>      * -mca btl openib,self (and vader will come along automatically)
>>> 
>>>    The "self" btl is *always* active as the loopback transport
>>> 
>>>      * -mca btl openib,self,vader (because vader is default only for 1-node
>>>      jobs)
>>>      * something else (or several alternatives)
>>> 
>>>      Whatever happened to the "self" btl in this new context?
>>>      Gone? Still there?
>>> 
>>>      Many thanks,
>>>      Gus Correa
>>> 
>>>      On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:
>>> 
>>>        On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> 
>>> wrote:
>>> 
>>>          and on the MCA parameter file:
>>> 
>>>          btl_sm_use_knem = 1
>>> 
>>>        I think the logic enforcing this MCA param got broken when we 
>>> revamped
>>>        the MCA param system.  :-(
>>> 
>>>          I am scratching my head to understand why a parameter with such a
>>>          suggestive name ("btl_sm_have_knem_support"),
>>>          so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
>>>          somehow vanished from ompi_info in OMPI 1.8.3.
>>> 
>>>        It looks like this MCA param was also dropped when we revamped the 
>>> MCA
>>>        system.  Doh!  :-(
>>> 
>>>        There's some deep mojo going on that is somehow causing knem to not 
>>> be
>>>        used; I'm too tired to understand the logic right now.  I just opened
>>>        https://github.com/open-mpi/ompi/issues/239 to track this issue --
>>>        feel free to subscribe to the issue to get updates.
>>> 
>>>      _______________________________________________
>>>      users mailing list
>>>      us...@open-mpi.org
>>>      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>      Link to this
>>>      post: http://www.open-mpi.org/community/lists/users/2014/10/25532.php
>> 
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post: 
>>> http://www.open-mpi.org/community/lists/users/2014/10/25534.php
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25647.php
>> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25649.php

Reply via email to