Thank you, Gilles. The reason for digging into intra-node optimizations is
that we've implemented several machine learning applications in OpenMPI
(Java binding), but found collective communication to be a bottleneck,
especially when the number of procs per node is high. I've implemented a
shared memory layer within Java (
https://www.researchgate.net/publication/291695433_SPIDAL_Java_High_Performance_Data_Analytics_with_Java_and_MPI_on_Large_Multicore_HPC_Clusters),
which solved this, but it would be nice to have this built-in.

I'll look at the send/recv implementations as well.

Regards,
Saliya

On Thu, Jun 30, 2016 at 10:02 AM, Gilles Gouaillardet <
gilles.gouaillar...@gmail.com> wrote:

> currently, coll/tuned is not topology aware.
> this is something interesting, and everyone is invited to contribute.
> coll/ml is topology aware, but it is kind of unmaintained now.
>
> send/recv involves two abstraction layer
> pml, and then the interconnect transport.
> typically, pml/ob1 is used, and it uses a btl (btl/tcp, btl/vader,
> btl/openib, ...)
> an important exception is infinipath, which uses pml/cm and then mtl/psm
> (and libfabric, but I do not know the details...)
>
> Cheers,
>
> Gilles
>
> On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com> wrote:
>
>> OK, I am beginning to see how it works now. One question I still have is,
>> in the case of a mult-node communicator it seems coll/tuned (or something
>> not coll/sm) well be the one used, so do they do any optimizations to
>> reduce communication within a node?
>>
>> Also where can I find the p2p send recv modules?
>>
>> Thank you
>> the Bcast in coll/sm
>>
>> coll modules have priority
>> (see ompi_info --all)
>>
>> for a given function (e,g. bcast) the module which implements it and has
>> the highest priority is used.
>> note a module can disqualify itself on a given communicator (e.g. coll/sm
>> on I ter node communucator).
>> by default, coll/tuned is very likely used. this module is a bit special
>> since it selects a given algorithm based on communicator and message size.
>>
>> if you give a high priority to coll/sm, then it will be used for single
>> node intra communicators, assuming coll/sm implements all
>> collective primitives.
>>
>> Cheers,
>>
>> Gilles
>>
>> On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com> wrote:
>>
>>> Thank you, Gilles.
>>>
>>> What is the bcast I should look for? In general, how do I know which
>>> module was used to for which communication - can I print this info?
>>> On Jun 30, 2016 3:19 AM, "Gilles Gouaillardet" <gil...@rist.or.jp>
>>> wrote:
>>>
>>>> 1) is correct. coll/sm is disqualified if the communicator is an inter
>>>> communicator or the communicator spans on several nodes.
>>>>
>>>> you can have a look at the source code, and you will not that bcast
>>>> does not use send/recv. instead, it uses a shared memory, so hopefully, it
>>>> is faster than other modules
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>> Gilles
>>>> On 6/30/2016 3:04 PM, Saliya Ekanayake wrote:
>>>>
>>>> Hi,
>>>>
>>>> Looking at the *ompi/mca/coll/sm/coll_sm_module.c* it seems this
>>>> module will be used only if the calling communicator solely groups
>>>> processes within a node. I've got two questions here.
>>>>
>>>> 1. So is my understanding correct that for something like
>>>> MPI_COMM_WORLD where world is multiple processes within a node across many
>>>> nodes, this module will not be used?
>>>>
>>>> 2. If 1, is correct then are there any shared memory optimizations that
>>>> happen when a collective like bcast  or allreduce is called, so that
>>>> communicating within a node is done efficiently through memory?
>>>>
>>>> Thank you,
>>>> Saliya
>>>>
>>>>
>>>> --
>>>> Saliya Ekanayake
>>>> Ph.D. Candidate | Research Assistant
>>>> School of Informatics and Computing | Digital Science Center
>>>> Indiana University, Bloomington
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing listus...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post: 
>>>> http://www.open-mpi.org/community/lists/users/2016/06/29564.php
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org
>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2016/06/29565.php
>>>>
>>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/06/29567.php
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29569.php
>



-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington

Reply via email to