Are you sure you are not using the vader BTL ? 

Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem 
initialization info. 

The CMA linux system (that ships with most 3.1x linux kernels) has similar 
features, and is also supported in sm.

Aurelien
--
          ~~~ Aurélien Bouteiller, Ph.D. ~~~
             ~ Research Scientist @ ICL ~
The University of Tennessee, Innovative Computing Laboratory
1122 Volunteer Blvd, suite 309, Knoxville, TN 37996
tel: +1 (865) 974-9375       fax: +1 (865) 974-8296
https://icl.cs.utk.edu/~bouteill/




Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit :

> Dear Open MPI developers
> 
> Well, I just can't keep my promises for too long ...
> So, here I am pestering you again, although this time
> it is not a request for more documentation.
> Hopefully it is something more legit.
> 
> I am having trouble using knem with Open MPI 1.8.3,
> and need your help.
> 
> I configured Open MPI 1.8.3 with knem.
> I had done the same with some builds of Open MPI 1.6.5 before.
> 
> When I build and launch the Intel MPI benchmarks (IMB)
> with Open MPI 1.6.5,
> 'cat /dev/knem'
> starts showing non-zero-and-growing statistics right away.
> 
> However, when I build and launch IMB with Open MPI 1.8.3,
> /dev/knem shows only zeros,
> no statistics growing, nothing.
> Knem just seems to be completely asleep.
> 
> So, my conclusion is that somehow knem is not working with OMPI 1.8.3,
> at least not for me.
> 
> ***
> 
> The runtime environment related to knem is setup the
> same way on both OPMI releases.
> I tried setting it up both on the command line:
> 
> -mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576
> 
> and on the MCA parameter file:
> 
> btl_sm_use_knem = 1
> btl_sm_eager_limit = 32768
> btl_sm_knem_dma_min = 1048576
> 
> and the behavior is the same (i.e., knem is active in 1.6.5,
> but doesn't seem to be used by 1.8.3, as indicated by the
> /dev/knem statistics.)
> 
> ***
> 
> When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show:
> 
> #define OMPI_BTL_SM_HAVE_KNEM 1
> 
> suggesting that both configurations picked up knem correctly.
> 
> On the other hand, when I do 'ompi_info --all --all |grep knem',
> OMPI 1.6.5 shows "btl_sm_have_knem_support":
> 
> 'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source: 
> default value)  Whether this component supports the knem Linux kernel module 
> or not'
> 
> By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item 
> ("btl_sm_have_knem_support"),
> although the *other* 'btl sm knem' items are there,
> namely "btl_sm_use_knem","btl_sm_knem_dma_min", 
> "btl_sm_knem_max_simultaneous".
> 
> I am scratching my head to understand why a parameter with such a
> suggestive name ("btl_sm_have_knem_support"),
> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
> somehow vanished from ompi_info in OMPI 1.8.3.
> 
> ***
> 
> Questions:
> 
> - Am I doing something totally wrong,
> perhaps with the knem runtime environment?
> 
> - Was knem somehow phased out in 1.8.3?
> 
> - Could there be a bad interaction with other runtime parameters that
> somehow is knocking out knem in 1.8.3?
> (FYI, besides knem, I'm just excluding the tcp btl, binding to core, and 
> reporting the bindings, which is exactly what I do on 1.6.5,
> although the runtime parameter syntax has changed.)
> 
> - Is knem inadvertently not being activated at runtime in OMPI 1.8.3?
> (i.e. a bug)
> 
> - Is there a way to increase verbosity to detect if knem is being
> used by OMPI?
> That would certainly help to check what is going on.
> I tried '-mca btl_base_verbose 30' but there was no trace of knem
> in sderr/stdout of either 1.6.5 or 1.8.3.
> So, the evidence I have that knem is
> active in 1.6.5 but not in 1.8.3 comes only from the statistics in
> /dev/knem.
> 
> ***
> 
> 
> Thank you,
> Gus Correa
> 
> ***
> 
> PS - As an aside, I also have some questions on the knem setup,
> which I mostly copied from the knem web site
> (hopefully Brice Goglin is listening ...):
> 
> - Is 32768 in 'btl_sm_eager_limit 32768' a good number,
> or should it be larger/smaller/something else?
> [OK, I know I should benchmark it, but exploring the whole parameter
> space takes long, so why not asking? ]
> 
> - Is it worth using 'btl_sm_knem_dma_min 1048576'?
> [I think I read somewhere that this dma engine offload
> is an Intel thing, not AMD.]
> 
> - How about btl_sm_knem_max_simultaneous?
> That one is not mentioned in the knem web site.
> Should I leave it default to zero or set it to 1? 2? 4? Something else?
> 
> 
> Thanks again,
> Gus Correa
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2014/10/25511.php

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to