On 10/16/2014 05:28 PM, Nathan Hjelm wrote:
And it doesn't support knem at this time. Probably never will because of
the existence of CMA.

-Nathan


Thanks, Nathan

But for the benefit of mere mortals like me
who don't share the dark or the bright side of the force,
and just need to keep their MPI applications running in production mode,
hopefully with Open MPI 1.8,
can somebody explain more clearly what "vader" is about?

Thank you,
Gus Correa


On Thu, Oct 16, 2014 at 01:49:09PM -0700, Ralph Castain wrote:
FWIW: vader is the default in 1.8

On Oct 16, 2014, at 1:40 PM, Aurélien Bouteiller <boute...@icl.utk.edu> wrote:

Are you sure you are not using the vader BTL ?

Setting mca_btl_base_verbose and/or sm_verbose should spit out some knem 
initialization info.

The CMA linux system (that ships with most 3.1x linux kernels) has similar 
features, and is also supported in sm.

Aurelien
--
          ~~~ Aurélien Bouteiller, Ph.D. ~~~
             ~ Research Scientist @ ICL ~
The University of Tennessee, Innovative Computing Laboratory
1122 Volunteer Blvd, suite 309, Knoxville, TN 37996
tel: +1 (865) 974-9375       fax: +1 (865) 974-8296
https://icl.cs.utk.edu/~bouteill/




Le 16 oct. 2014 à 16:35, Gus Correa <g...@ldeo.columbia.edu> a écrit :

Dear Open MPI developers

Well, I just can't keep my promises for too long ...
So, here I am pestering you again, although this time
it is not a request for more documentation.
Hopefully it is something more legit.

I am having trouble using knem with Open MPI 1.8.3,
and need your help.

I configured Open MPI 1.8.3 with knem.
I had done the same with some builds of Open MPI 1.6.5 before.

When I build and launch the Intel MPI benchmarks (IMB)
with Open MPI 1.6.5,
'cat /dev/knem'
starts showing non-zero-and-growing statistics right away.

However, when I build and launch IMB with Open MPI 1.8.3,
/dev/knem shows only zeros,
no statistics growing, nothing.
Knem just seems to be completely asleep.

So, my conclusion is that somehow knem is not working with OMPI 1.8.3,
at least not for me.

***

The runtime environment related to knem is setup the
same way on both OPMI releases.
I tried setting it up both on the command line:

-mca btl_sm_eager_limit 32768 -mca btl_sm_knem_dma_min 1048576

and on the MCA parameter file:

btl_sm_use_knem = 1
btl_sm_eager_limit = 32768
btl_sm_knem_dma_min = 1048576

and the behavior is the same (i.e., knem is active in 1.6.5,
but doesn't seem to be used by 1.8.3, as indicated by the
/dev/knem statistics.)

***

When I 'grep -i knem config.log', both 1.6.5 and 1.8.3 builds show:

#define OMPI_BTL_SM_HAVE_KNEM 1

suggesting that both configurations picked up knem correctly.

On the other hand, when I do 'ompi_info --all --all |grep knem',
OMPI 1.6.5 shows "btl_sm_have_knem_support":

'MCA btl: information "btl_sm_have_knem_support" (value: <1>, data source: 
default value)  Whether this component supports the knem Linux kernel module or not'

By contrast, in OMPI 1.8.3 ompi_info doesn't show this particular item 
("btl_sm_have_knem_support"),
although the *other* 'btl sm knem' items are there,
namely "btl_sm_use_knem","btl_sm_knem_dma_min", "btl_sm_knem_max_simultaneous".

I am scratching my head to understand why a parameter with such a
suggestive name ("btl_sm_have_knem_support"),
so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
somehow vanished from ompi_info in OMPI 1.8.3.

***

Questions:

- Am I doing something totally wrong,
perhaps with the knem runtime environment?

- Was knem somehow phased out in 1.8.3?

- Could there be a bad interaction with other runtime parameters that
somehow is knocking out knem in 1.8.3?
(FYI, besides knem, I'm just excluding the tcp btl, binding to core, and 
reporting the bindings, which is exactly what I do on 1.6.5,
although the runtime parameter syntax has changed.)

- Is knem inadvertently not being activated at runtime in OMPI 1.8.3?
(i.e. a bug)

- Is there a way to increase verbosity to detect if knem is being
used by OMPI?
That would certainly help to check what is going on.
I tried '-mca btl_base_verbose 30' but there was no trace of knem
in sderr/stdout of either 1.6.5 or 1.8.3.
So, the evidence I have that knem is
active in 1.6.5 but not in 1.8.3 comes only from the statistics in
/dev/knem.

***


Thank you,
Gus Correa

***

PS - As an aside, I also have some questions on the knem setup,
which I mostly copied from the knem web site
(hopefully Brice Goglin is listening ...):

- Is 32768 in 'btl_sm_eager_limit 32768' a good number,
or should it be larger/smaller/something else?
[OK, I know I should benchmark it, but exploring the whole parameter
space takes long, so why not asking? ]

- Is it worth using 'btl_sm_knem_dma_min 1048576'?
[I think I read somewhere that this dma engine offload
is an Intel thing, not AMD.]

- How about btl_sm_knem_max_simultaneous?
That one is not mentioned in the knem web site.
Should I leave it default to zero or set it to 1? 2? 4? Something else?


Thanks again,
Gus Correa
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/10/25511.php

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/10/25512.php

_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/10/25513.php


_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/10/25515.php

Reply via email to