sm will not be used on multi-nodes?

Jeff Squyres (jsquyres) Thu, 30 Jun 2016 11:59:10 -0400 (EDT)

I actually wouldn't advise ml.  It *was* being developed as a joint project 
between ORNL and Mellanox.  I think that code eventually grew into what the 
"hcoll" Mellanox library currently is.


As such, ml reflects kind of a middle point before hcoll became hardened into a 
real product.  It has some known bugs which are unlikely to be fixed.

ORNL/Mellanox: please correct me if I'm wrong...


> On Jun 30, 2016, at 11:47 AM, Saliya Ekanayake <[email protected]> wrote:
> 
> OK, that's good. I'll try that.
> 
> So, is ml something not being developed now? Any documentation on this 
> component?
> 
> Thank you,
> Saliya
> 
> On Thu, Jun 30, 2016 at 11:01 AM, Gilles Gouaillardet 
> <[email protected]> wrote:
> you might want to give coll/ml a try
> mpirun --mca coll_ml_priority 100 ...
> 
> Cheers,
> 
> Gilles
> 
> On Thursday, June 30, 2016, Saliya Ekanayake <[email protected]> wrote:
> Thank you, Gilles. The reason for digging into intra-node optimizations is 
> that we've implemented several machine learning applications in OpenMPI (Java 
> binding), but found collective communication to be a bottleneck, especially 
> when the number of procs per node is high. I've implemented a shared memory 
> layer within Java 
> (https://www.researchgate.net/publication/291695433_SPIDAL_Java_High_Performance_Data_Analytics_with_Java_and_MPI_on_Large_Multicore_HPC_Clusters),
>  which solved this, but it would be nice to have this built-in.
> 
> I'll look at the send/recv implementations as well. 
> 
> Regards,
> Saliya
> 
> On Thu, Jun 30, 2016 at 10:02 AM, Gilles Gouaillardet 
> <[email protected]> wrote:
> currently, coll/tuned is not topology aware.
> this is something interesting, and everyone is invited to contribute.
> coll/ml is topology aware, but it is kind of unmaintained now.
> 
> send/recv involves two abstraction layer 
> pml, and then the interconnect transport.
> typically, pml/ob1 is used, and it uses a btl (btl/tcp, btl/vader, 
> btl/openib, ...)
> an important exception is infinipath, which uses pml/cm and then mtl/psm
> (and libfabric, but I do not know the details...)
> 
> Cheers,
> 
> Gilles
> 
> On Thursday, June 30, 2016, Saliya Ekanayake <[email protected]> wrote:
> OK, I am beginning to see how it works now. One question I still have is, in 
> the case of a mult-node communicator it seems coll/tuned (or something not 
> coll/sm) well be the one used, so do they do any optimizations to reduce 
> communication within a node?
> 
> Also where can I find the p2p send recv modules?
> 
> Thank you
> 
> the Bcast in coll/sm
> 
> coll modules have priority
> (see ompi_info --all)
> 
> for a given function (e,g. bcast) the module which implements it and has the 
> highest priority is used. 
> note a module can disqualify itself on a given communicator (e.g. coll/sm on 
> I ter node communucator).
> by default, coll/tuned is very likely used. this module is a bit special 
> since it selects a given algorithm based on communicator and message size.
> 
> if you give a high priority to coll/sm, then it will be used for single node 
> intra communicators, assuming coll/sm implements all collective primitives.
> 
> Cheers,
> 
> Gilles
> 
> On Thursday, June 30, 2016, Saliya Ekanayake <[email protected]> wrote:
> Thank you, Gilles.
> 
> What is the bcast I should look for? In general, how do I know which module 
> was used to for which communication - can I print this info?
> 
> On Jun 30, 2016 3:19 AM, "Gilles Gouaillardet" <[email protected]> wrote:
> 1) is correct. coll/sm is disqualified if the communicator is an inter 
> communicator or the communicator spans on several nodes.
> 
> you can have a look at the source code, and you will not that bcast does not 
> use send/recv. instead, it uses a shared memory, so hopefully, it is faster 
> than other modules
> 
> 
> Cheers,
> 
> 
> Gilles
> On 6/30/2016 3:04 PM, Saliya Ekanayake wrote:
>> Hi,
>> 
>> Looking at the ompi/mca/coll/sm/coll_sm_module.c it seems this module will 
>> be used only if the calling communicator solely groups processes within a 
>> node. I've got two questions here.
>> 
>> 1. So is my understanding correct that for something like MPI_COMM_WORLD 
>> where world is multiple processes within a node across many nodes, this 
>> module will not be used?
>> 
>> 2. If 1, is correct then are there any shared memory optimizations that 
>> happen when a collective like bcast  or allreduce is called, so that 
>> communicating within a node is done efficiently through memory?
>> 
>> Thank you,
>> Saliya
>> 
>> 
>> -- 
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> 
>> [email protected]
>> 
>> Subscription: 
>> https://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/06/29564.php
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29565.php
> 
> _______________________________________________
> users mailing list
> [email protected]
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29567.php
> 
> _______________________________________________
> users mailing list
> [email protected]
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29569.php
> 
> 
> 
> -- 
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> 
> 
> _______________________________________________
> users mailing list
> [email protected]
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29571.php
> 
> 
> 
> -- 
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
> 
> _______________________________________________
> users mailing list
> [email protected]
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29572.php


-- 
Jeff Squyres
[email protected]
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] The ompi/mca/cool/sm will not be used on multi-nodes?

Reply via email to