FWIW: vader is the default in 1.8 On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller <boute...@icl.utk.edu> wrote:
> Are you sure you are not using the vader BTL ? > > Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem > initialization info. > > The CMA linux system (that ships with most 3.1x linux kernels) has similar > features, and is also supported in sm. > > Aurelien > -- > ~~~ Aurélien Bouteiller, Ph.D. ~~~ > ~ Research Scientist @ ICL ~ > The University of Tennessee, Innovative Computing Laboratory > 1122 Volunteer Blvd, suite 309, Knoxville, TN 37996 > tel: +1 (865) 974-9375 fax: +1 (865) 974-8296 > https://icl.cs.utk.edu/~bouteill/ > > > > > Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit : > >> Dear Open MPI developers >> >> Well, I just can't keep my promises for too long ... >> So, here I am pestering you again, although this time >> it is not a request for more documentation. >> Hopefully it is something more legit. >> >> I am having trouble using knem with Open MPI 1.8.3, >> and need your help. >> >> I configured Open MPI 1.8.3 with knem. >> I had done the same with some builds of Open MPI 1.6.5 before. >> >> When I build and launch the Intel MPI benchmarks (IMB) >> with Open MPI 1.6.5, >> 'cat /dev/knem' >> starts showing non-zero-and-growing statistics right away. >> >> However, when I build and launch IMB with Open MPI 1.8.3, >> /dev/knem shows only zeros, >> no statistics growing, nothing. >> Knem just seems to be completely asleep. >> >> So, my conclusion is that somehow knem is not working with OMPI 1.8.3, >> at least not for me. >> >> *** >> >> The runtime environment related to knem is setup the >> same way on both OPMI releases. >> I tried setting it up both on the command line: >> >> -mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576 >> >> and on the MCA parameter file: >> >> btl_sm_use_knem = 1 >> btl_sm_eager_limit = 32768 >> btl_sm_knem_dma_min = 1048576 >> >> and the behavior is the same (i.e., knem is active in 1.6.5, >> but doesn't seem to be used by 1.8.3, as indicated by the >> /dev/knem statistics.) >> >> *** >> >> When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show: >> >> #define OMPI_BTL_SM_HAVE_KNEM 1 >> >> suggesting that both configurations picked up knem correctly. >> >> On the other hand, when I do 'ompi_info --all --all |grep knem', >> OMPI 1.6.5 shows "btl_sm_have_knem_support": >> >> 'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source: >> default value) Whether this component supports the knem Linux kernel module >> or not' >> >> By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item >> ("btl_sm_have_knem_support"), >> although the *other* 'btl sm knem' items are there, >> namely "btl_sm_use_knem","btl_sm_knem_dma_min", >> "btl_sm_knem_max_simultaneous". >> >> I am scratching my head to understand why a parameter with such a >> suggestive name ("btl_sm_have_knem_support"), >> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro, >> somehow vanished from ompi_info in OMPI 1.8.3. >> >> *** >> >> Questions: >> >> - Am I doing something totally wrong, >> perhaps with the knem runtime environment? >> >> - Was knem somehow phased out in 1.8.3? >> >> - Could there be a bad interaction with other runtime parameters that >> somehow is knocking out knem in 1.8.3? >> (FYI, besides knem, I'm just excluding the tcp btl, binding to core, and >> reporting the bindings, which is exactly what I do on 1.6.5, >> although the runtime parameter syntax has changed.) >> >> - Is knem inadvertently not being activated at runtime in OMPI 1.8.3? >> (i.e. a bug) >> >> - Is there a way to increase verbosity to detect if knem is being >> used by OMPI? >> That would certainly help to check what is going on. >> I tried '-mca btl_base_verbose 30' but there was no trace of knem >> in sderr/stdout of either 1.6.5 or 1.8.3. >> So, the evidence I have that knem is >> active in 1.6.5 but not in 1.8.3 comes only from the statistics in >> /dev/knem. >> >> *** >> >> >> Thank you, >> Gus Correa >> >> *** >> >> PS - As an aside, I also have some questions on the knem setup, >> which I mostly copied from the knem web site >> (hopefully Brice Goglin is listening ...): >> >> - Is 32768 in 'btl_sm_eager_limit 32768' a good number, >> or should it be larger/smaller/something else? >> [OK, I know I should benchmark it, but exploring the whole parameter >> space takes long, so why not asking? ] >> >> - Is it worth using 'btl_sm_knem_dma_min 1048576'? >> [I think I read somewhere that this dma engine offload >> is an Intel thing, not AMD.] >> >> - How about btl_sm_knem_max_simultaneous? >> That one is not mentioned in the knem web site. >> Should I leave it default to zero or set it to 1? 2? 4? Something else? >> >> >> Thanks again, >> Gus Correa >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/10/25511.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25512.php