FWIW: vader is the default in 1.8

On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller <boute...@icl.utk.edu> wrote:

> Are you sure you are not using the vader BTL ? 
> 
> Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem 
> initialization info. 
> 
> The CMA linux system (that ships with most 3.1x linux kernels) has similar 
> features, and is also supported in sm.
> 
> Aurelien
> --
>          ~~~ Aurélien Bouteiller, Ph.D. ~~~
>             ~ Research Scientist @ ICL ~
> The University of Tennessee, Innovative Computing Laboratory
> 1122 Volunteer Blvd, suite 309, Knoxville, TN 37996
> tel: +1 (865) 974-9375       fax: +1 (865) 974-8296
> https://icl.cs.utk.edu/~bouteill/
> 
> 
> 
> 
> Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit :
> 
>> Dear Open MPI developers
>> 
>> Well, I just can't keep my promises for too long ...
>> So, here I am pestering you again, although this time
>> it is not a request for more documentation.
>> Hopefully it is something more legit.
>> 
>> I am having trouble using knem with Open MPI 1.8.3,
>> and need your help.
>> 
>> I configured Open MPI 1.8.3 with knem.
>> I had done the same with some builds of Open MPI 1.6.5 before.
>> 
>> When I build and launch the Intel MPI benchmarks (IMB)
>> with Open MPI 1.6.5,
>> 'cat /dev/knem'
>> starts showing non-zero-and-growing statistics right away.
>> 
>> However, when I build and launch IMB with Open MPI 1.8.3,
>> /dev/knem shows only zeros,
>> no statistics growing, nothing.
>> Knem just seems to be completely asleep.
>> 
>> So, my conclusion is that somehow knem is not working with OMPI 1.8.3,
>> at least not for me.
>> 
>> ***
>> 
>> The runtime environment related to knem is setup the
>> same way on both OPMI releases.
>> I tried setting it up both on the command line:
>> 
>> -mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576
>> 
>> and on the MCA parameter file:
>> 
>> btl_sm_use_knem = 1
>> btl_sm_eager_limit = 32768
>> btl_sm_knem_dma_min = 1048576
>> 
>> and the behavior is the same (i.e., knem is active in 1.6.5,
>> but doesn't seem to be used by 1.8.3, as indicated by the
>> /dev/knem statistics.)
>> 
>> ***
>> 
>> When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show:
>> 
>> #define OMPI_BTL_SM_HAVE_KNEM 1
>> 
>> suggesting that both configurations picked up knem correctly.
>> 
>> On the other hand, when I do 'ompi_info --all --all |grep knem',
>> OMPI 1.6.5 shows "btl_sm_have_knem_support":
>> 
>> 'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source: 
>> default value)  Whether this component supports the knem Linux kernel module 
>> or not'
>> 
>> By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item 
>> ("btl_sm_have_knem_support"),
>> although the *other* 'btl sm knem' items are there,
>> namely "btl_sm_use_knem","btl_sm_knem_dma_min", 
>> "btl_sm_knem_max_simultaneous".
>> 
>> I am scratching my head to understand why a parameter with such a
>> suggestive name ("btl_sm_have_knem_support"),
>> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
>> somehow vanished from ompi_info in OMPI 1.8.3.
>> 
>> ***
>> 
>> Questions:
>> 
>> - Am I doing something totally wrong,
>> perhaps with the knem runtime environment?
>> 
>> - Was knem somehow phased out in 1.8.3?
>> 
>> - Could there be a bad interaction with other runtime parameters that
>> somehow is knocking out knem in 1.8.3?
>> (FYI, besides knem, I'm just excluding the tcp btl, binding to core, and 
>> reporting the bindings, which is exactly what I do on 1.6.5,
>> although the runtime parameter syntax has changed.)
>> 
>> - Is knem inadvertently not being activated at runtime in OMPI 1.8.3?
>> (i.e. a bug)
>> 
>> - Is there a way to increase verbosity to detect if knem is being
>> used by OMPI?
>> That would certainly help to check what is going on.
>> I tried '-mca btl_base_verbose 30' but there was no trace of knem
>> in sderr/stdout of either 1.6.5 or 1.8.3.
>> So, the evidence I have that knem is
>> active in 1.6.5 but not in 1.8.3 comes only from the statistics in
>> /dev/knem.
>> 
>> ***
>> 
>> 
>> Thank you,
>> Gus Correa
>> 
>> ***
>> 
>> PS - As an aside, I also have some questions on the knem setup,
>> which I mostly copied from the knem web site
>> (hopefully Brice Goglin is listening ...):
>> 
>> - Is 32768 in 'btl_sm_eager_limit 32768' a good number,
>> or should it be larger/smaller/something else?
>> [OK, I know I should benchmark it, but exploring the whole parameter
>> space takes long, so why not asking? ]
>> 
>> - Is it worth using 'btl_sm_knem_dma_min 1048576'?
>> [I think I read somewhere that this dma engine offload
>> is an Intel thing, not AMD.]
>> 
>> - How about btl_sm_knem_max_simultaneous?
>> That one is not mentioned in the knem web site.
>> Should I leave it default to zero or set it to 1? 2? 4? Something else?
>> 
>> 
>> Thanks again,
>> Gus Correa
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2014/10/25511.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25512.php

Reply via email to