And it doesn't support knem at this time. Probably never will because of the existence of CMA.
-Nathan On Thu, Oct 16, 2014 at 01:49:09PM -0700, Ralph Castain wrote: > FWIW: vader is the default in 1.8 > > On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller <boute...@icl.utk.edu> wrote: > > > Are you sure you are not using the vader BTL ? > > > > Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem > > initialization info. > > > > The CMA linux system (that ships with most 3.1x linux kernels) has similar > > features, and is also supported in sm. > > > > Aurelien > > -- > > ~~~ Aurélien Bouteiller, Ph.D. ~~~ > > ~ Research Scientist @ ICL ~ > > The University of Tennessee, Innovative Computing Laboratory > > 1122 Volunteer Blvd, suite 309, Knoxville, TN 37996 > > tel: +1 (865) 974-9375 fax: +1 (865) 974-8296 > > https://icl.cs.utk.edu/~bouteill/ > > > > > > > > > > Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit : > > > >> Dear Open MPI developers > >> > >> Well, I just can't keep my promises for too long ... > >> So, here I am pestering you again, although this time > >> it is not a request for more documentation. > >> Hopefully it is something more legit. > >> > >> I am having trouble using knem with Open MPI 1.8.3, > >> and need your help. > >> > >> I configured Open MPI 1.8.3 with knem. > >> I had done the same with some builds of Open MPI 1.6.5 before. > >> > >> When I build and launch the Intel MPI benchmarks (IMB) > >> with Open MPI 1.6.5, > >> 'cat /dev/knem' > >> starts showing non-zero-and-growing statistics right away. > >> > >> However, when I build and launch IMB with Open MPI 1.8.3, > >> /dev/knem shows only zeros, > >> no statistics growing, nothing. > >> Knem just seems to be completely asleep. > >> > >> So, my conclusion is that somehow knem is not working with OMPI 1.8.3, > >> at least not for me. > >> > >> *** > >> > >> The runtime environment related to knem is setup the > >> same way on both OPMI releases. > >> I tried setting it up both on the command line: > >> > >> -mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576 > >> > >> and on the MCA parameter file: > >> > >> btl_sm_use_knem = 1 > >> btl_sm_eager_limit = 32768 > >> btl_sm_knem_dma_min = 1048576 > >> > >> and the behavior is the same (i.e., knem is active in 1.6.5, > >> but doesn't seem to be used by 1.8.3, as indicated by the > >> /dev/knem statistics.) > >> > >> *** > >> > >> When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show: > >> > >> #define OMPI_BTL_SM_HAVE_KNEM 1 > >> > >> suggesting that both configurations picked up knem correctly. > >> > >> On the other hand, when I do 'ompi_info --all --all |grep knem', > >> OMPI 1.6.5 shows "btl_sm_have_knem_support": > >> > >> 'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source: > >> default value) Whether this component supports the knem Linux kernel > >> module or not' > >> > >> By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item > >> ("btl_sm_have_knem_support"), > >> although the *other* 'btl sm knem' items are there, > >> namely "btl_sm_use_knem","btl_sm_knem_dma_min", > >> "btl_sm_knem_max_simultaneous". > >> > >> I am scratching my head to understand why a parameter with such a > >> suggestive name ("btl_sm_have_knem_support"), > >> so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro, > >> somehow vanished from ompi_info in OMPI 1.8.3. > >> > >> *** > >> > >> Questions: > >> > >> - Am I doing something totally wrong, > >> perhaps with the knem runtime environment? > >> > >> - Was knem somehow phased out in 1.8.3? > >> > >> - Could there be a bad interaction with other runtime parameters that > >> somehow is knocking out knem in 1.8.3? > >> (FYI, besides knem, I'm just excluding the tcp btl, binding to core, and > >> reporting the bindings, which is exactly what I do on 1.6.5, > >> although the runtime parameter syntax has changed.) > >> > >> - Is knem inadvertently not being activated at runtime in OMPI 1.8.3? > >> (i.e. a bug) > >> > >> - Is there a way to increase verbosity to detect if knem is being > >> used by OMPI? > >> That would certainly help to check what is going on. > >> I tried '-mca btl_base_verbose 30' but there was no trace of knem > >> in sderr/stdout of either 1.6.5 or 1.8.3. > >> So, the evidence I have that knem is > >> active in 1.6.5 but not in 1.8.3 comes only from the statistics in > >> /dev/knem. > >> > >> *** > >> > >> > >> Thank you, > >> Gus Correa > >> > >> *** > >> > >> PS - As an aside, I also have some questions on the knem setup, > >> which I mostly copied from the knem web site > >> (hopefully Brice Goglin is listening ...): > >> > >> - Is 32768 in 'btl_sm_eager_limit 32768' a good number, > >> or should it be larger/smaller/something else? > >> [OK, I know I should benchmark it, but exploring the whole parameter > >> space takes long, so why not asking? ] > >> > >> - Is it worth using 'btl_sm_knem_dma_min 1048576'? > >> [I think I read somewhere that this dma engine offload > >> is an Intel thing, not AMD.] > >> > >> - How about btl_sm_knem_max_simultaneous? > >> That one is not mentioned in the knem web site. > >> Should I leave it default to zero or set it to 1? 2? 4? Something else? > >> > >> > >> Thanks again, > >> Gus Correa > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > >> Link to this post: > >> http://www.open-mpi.org/community/lists/users/2014/10/25511.php > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > > Link to this post: > > http://www.open-mpi.org/community/lists/users/2014/10/25512.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/10/25513.php
pgpt4AEriuKYv.pgp
Description: PGP signature