Are you sure you are not using the vader BTL ? Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem initialization info.
The CMA linux system (that ships with most 3.1x linux kernels) has similar features, and is also supported in sm. Aurelien -- ~~~ Aurélien Bouteiller, Ph.D. ~~~ ~ Research Scientist @ ICL ~ The University of Tennessee, Innovative Computing Laboratory 1122 Volunteer Blvd, suite 309, Knoxville, TN 37996 tel: +1 (865) 974-9375 fax: +1 (865) 974-8296 https://icl.cs.utk.edu/~bouteill/ Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit : > Dear Open MPI developers > > Well, I just can't keep my promises for too long ... > So, here I am pestering you again, although this time > it is not a request for more documentation. > Hopefully it is something more legit. > > I am having trouble using knem with Open MPI 1.8.3, > and need your help. > > I configured Open MPI 1.8.3 with knem. > I had done the same with some builds of Open MPI 1.6.5 before. > > When I build and launch the Intel MPI benchmarks (IMB) > with Open MPI 1.6.5, > 'cat /dev/knem' > starts showing non-zero-and-growing statistics right away. > > However, when I build and launch IMB with Open MPI 1.8.3, > /dev/knem shows only zeros, > no statistics growing, nothing. > Knem just seems to be completely asleep. > > So, my conclusion is that somehow knem is not working with OMPI 1.8.3, > at least not for me. > > *** > > The runtime environment related to knem is setup the > same way on both OPMI releases. > I tried setting it up both on the command line: > > -mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576 > > and on the MCA parameter file: > > btl_sm_use_knem = 1 > btl_sm_eager_limit = 32768 > btl_sm_knem_dma_min = 1048576 > > and the behavior is the same (i.e., knem is active in 1.6.5, > but doesn't seem to be used by 1.8.3, as indicated by the > /dev/knem statistics.) > > *** > > When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show: > > #define OMPI_BTL_SM_HAVE_KNEM 1 > > suggesting that both configurations picked up knem correctly. > > On the other hand, when I do 'ompi_info --all --all |grep knem', > OMPI 1.6.5 shows "btl_sm_have_knem_support": > > 'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source: > default value) Whether this component supports the knem Linux kernel module > or not' > > By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item > ("btl_sm_have_knem_support"), > although the *other* 'btl sm knem' items are there, > namely "btl_sm_use_knem","btl_sm_knem_dma_min", > "btl_sm_knem_max_simultaneous". > > I am scratching my head to understand why a parameter with such a > suggestive name ("btl_sm_have_knem_support"), > so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro, > somehow vanished from ompi_info in OMPI 1.8.3. > > *** > > Questions: > > - Am I doing something totally wrong, > perhaps with the knem runtime environment? > > - Was knem somehow phased out in 1.8.3? > > - Could there be a bad interaction with other runtime parameters that > somehow is knocking out knem in 1.8.3? > (FYI, besides knem, I'm just excluding the tcp btl, binding to core, and > reporting the bindings, which is exactly what I do on 1.6.5, > although the runtime parameter syntax has changed.) > > - Is knem inadvertently not being activated at runtime in OMPI 1.8.3? > (i.e. a bug) > > - Is there a way to increase verbosity to detect if knem is being > used by OMPI? > That would certainly help to check what is going on. > I tried '-mca btl_base_verbose 30' but there was no trace of knem > in sderr/stdout of either 1.6.5 or 1.8.3. > So, the evidence I have that knem is > active in 1.6.5 but not in 1.8.3 comes only from the statistics in > /dev/knem. > > *** > > > Thank you, > Gus Correa > > *** > > PS - As an aside, I also have some questions on the knem setup, > which I mostly copied from the knem web site > (hopefully Brice Goglin is listening ...): > > - Is 32768 in 'btl_sm_eager_limit 32768' a good number, > or should it be larger/smaller/something else? > [OK, I know I should benchmark it, but exploring the whole parameter > space takes long, so why not asking? ] > > - Is it worth using 'btl_sm_knem_dma_min 1048576'? > [I think I read somewhere that this dma engine offload > is an Intel thing, not AMD.] > > - How about btl_sm_knem_max_simultaneous? > That one is not mentioned in the knem web site. > Should I leave it default to zero or set it to 1? 2? 4? Something else? > > > Thanks again, > Gus Correa > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25511.php
signature.asc
Description: Message signed with OpenPGP using GPGMail