you might want to give coll/ml a try
mpirun --mca coll_ml_priority 100 ...

Cheers,

Gilles

On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com> wrote:

> Thank you, Gilles. The reason for digging into intra-node optimizations is
> that we've implemented several machine learning applications in OpenMPI
> (Java binding), but found collective communication to be a bottleneck,
> especially when the number of procs per node is high. I've implemented a
> shared memory layer within Java (
> https://www.researchgate.net/publication/291695433_SPIDAL_Java_High_Performance_Data_Analytics_with_Java_and_MPI_on_Large_Multicore_HPC_Clusters),
> which solved this, but it would be nice to have this built-in.
>
> I'll look at the send/recv implementations as well.
>
> Regards,
> Saliya
>
> On Thu, Jun 30, 2016 at 10:02 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote:
>
>> currently, coll/tuned is not topology aware.
>> this is something interesting, and everyone is invited to contribute.
>> coll/ml is topology aware, but it is kind of unmaintained now.
>>
>> send/recv involves two abstraction layer
>> pml, and then the interconnect transport.
>> typically, pml/ob1 is used, and it uses a btl (btl/tcp, btl/vader,
>> btl/openib, ...)
>> an important exception is infinipath, which uses pml/cm and then mtl/psm
>> (and libfabric, but I do not know the details...)
>>
>> Cheers,
>>
>> Gilles
>>
>> On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com
>> <javascript:_e(%7B%7D,'cvml','esal...@gmail.com');>> wrote:
>>
>>> OK, I am beginning to see how it works now. One question I still have
>>> is, in the case of a mult-node communicator it seems coll/tuned (or
>>> something not coll/sm) well be the one used, so do they do any
>>> optimizations to reduce communication within a node?
>>>
>>> Also where can I find the p2p send recv modules?
>>>
>>> Thank you
>>> the Bcast in coll/sm
>>>
>>> coll modules have priority
>>> (see ompi_info --all)
>>>
>>> for a given function (e,g. bcast) the module which implements it and has
>>> the highest priority is used.
>>> note a module can disqualify itself on a given communicator (e.g.
>>> coll/sm on I ter node communucator).
>>> by default, coll/tuned is very likely used. this module is a bit special
>>> since it selects a given algorithm based on communicator and message size.
>>>
>>> if you give a high priority to coll/sm, then it will be used for single
>>> node intra communicators, assuming coll/sm implements all
>>> collective primitives.
>>>
>>> Cheers,
>>>
>>> Gilles
>>>
>>> On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com> wrote:
>>>
>>>> Thank you, Gilles.
>>>>
>>>> What is the bcast I should look for? In general, how do I know which
>>>> module was used to for which communication - can I print this info?
>>>> On Jun 30, 2016 3:19 AM, "Gilles Gouaillardet" <gil...@rist.or.jp>
>>>> wrote:
>>>>
>>>>> 1) is correct. coll/sm is disqualified if the communicator is an inter
>>>>> communicator or the communicator spans on several nodes.
>>>>>
>>>>> you can have a look at the source code, and you will not that bcast
>>>>> does not use send/recv. instead, it uses a shared memory, so hopefully, it
>>>>> is faster than other modules
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>> Gilles
>>>>> On 6/30/2016 3:04 PM, Saliya Ekanayake wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Looking at the *ompi/mca/coll/sm/coll_sm_module.c* it seems this
>>>>> module will be used only if the calling communicator solely groups
>>>>> processes within a node. I've got two questions here.
>>>>>
>>>>> 1. So is my understanding correct that for something like
>>>>> MPI_COMM_WORLD where world is multiple processes within a node across many
>>>>> nodes, this module will not be used?
>>>>>
>>>>> 2. If 1, is correct then are there any shared memory optimizations
>>>>> that happen when a collective like bcast  or allreduce is called, so that
>>>>> communicating within a node is done efficiently through memory?
>>>>>
>>>>> Thank you,
>>>>> Saliya
>>>>>
>>>>>
>>>>> --
>>>>> Saliya Ekanayake
>>>>> Ph.D. Candidate | Research Assistant
>>>>> School of Informatics and Computing | Digital Science Center
>>>>> Indiana University, Bloomington
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing listus...@open-mpi.org
>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post: 
>>>>> http://www.open-mpi.org/community/lists/users/2016/06/29564.php
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org
>>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2016/06/29565.php
>>>>>
>>>>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2016/06/29567.php
>>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');>
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/06/29569.php
>>
>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
>
>

Reply via email to