you might want to give coll/ml a try mpirun --mca coll_ml_priority 100 ... Cheers,
Gilles On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com> wrote: > Thank you, Gilles. The reason for digging into intra-node optimizations is > that we've implemented several machine learning applications in OpenMPI > (Java binding), but found collective communication to be a bottleneck, > especially when the number of procs per node is high. I've implemented a > shared memory layer within Java ( > https://www.researchgate.net/publication/291695433_SPIDAL_Java_High_Performance_Data_Analytics_with_Java_and_MPI_on_Large_Multicore_HPC_Clusters), > which solved this, but it would be nice to have this built-in. > > I'll look at the send/recv implementations as well. > > Regards, > Saliya > > On Thu, Jun 30, 2016 at 10:02 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > <javascript:_e(%7B%7D,'cvml','gilles.gouaillar...@gmail.com');>> wrote: > >> currently, coll/tuned is not topology aware. >> this is something interesting, and everyone is invited to contribute. >> coll/ml is topology aware, but it is kind of unmaintained now. >> >> send/recv involves two abstraction layer >> pml, and then the interconnect transport. >> typically, pml/ob1 is used, and it uses a btl (btl/tcp, btl/vader, >> btl/openib, ...) >> an important exception is infinipath, which uses pml/cm and then mtl/psm >> (and libfabric, but I do not know the details...) >> >> Cheers, >> >> Gilles >> >> On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com >> <javascript:_e(%7B%7D,'cvml','esal...@gmail.com');>> wrote: >> >>> OK, I am beginning to see how it works now. One question I still have >>> is, in the case of a mult-node communicator it seems coll/tuned (or >>> something not coll/sm) well be the one used, so do they do any >>> optimizations to reduce communication within a node? >>> >>> Also where can I find the p2p send recv modules? >>> >>> Thank you >>> the Bcast in coll/sm >>> >>> coll modules have priority >>> (see ompi_info --all) >>> >>> for a given function (e,g. bcast) the module which implements it and has >>> the highest priority is used. >>> note a module can disqualify itself on a given communicator (e.g. >>> coll/sm on I ter node communucator). >>> by default, coll/tuned is very likely used. this module is a bit special >>> since it selects a given algorithm based on communicator and message size. >>> >>> if you give a high priority to coll/sm, then it will be used for single >>> node intra communicators, assuming coll/sm implements all >>> collective primitives. >>> >>> Cheers, >>> >>> Gilles >>> >>> On Thursday, June 30, 2016, Saliya Ekanayake <esal...@gmail.com> wrote: >>> >>>> Thank you, Gilles. >>>> >>>> What is the bcast I should look for? In general, how do I know which >>>> module was used to for which communication - can I print this info? >>>> On Jun 30, 2016 3:19 AM, "Gilles Gouaillardet" <gil...@rist.or.jp> >>>> wrote: >>>> >>>>> 1) is correct. coll/sm is disqualified if the communicator is an inter >>>>> communicator or the communicator spans on several nodes. >>>>> >>>>> you can have a look at the source code, and you will not that bcast >>>>> does not use send/recv. instead, it uses a shared memory, so hopefully, it >>>>> is faster than other modules >>>>> >>>>> >>>>> Cheers, >>>>> >>>>> >>>>> Gilles >>>>> On 6/30/2016 3:04 PM, Saliya Ekanayake wrote: >>>>> >>>>> Hi, >>>>> >>>>> Looking at the *ompi/mca/coll/sm/coll_sm_module.c* it seems this >>>>> module will be used only if the calling communicator solely groups >>>>> processes within a node. I've got two questions here. >>>>> >>>>> 1. So is my understanding correct that for something like >>>>> MPI_COMM_WORLD where world is multiple processes within a node across many >>>>> nodes, this module will not be used? >>>>> >>>>> 2. If 1, is correct then are there any shared memory optimizations >>>>> that happen when a collective like bcast or allreduce is called, so that >>>>> communicating within a node is done efficiently through memory? >>>>> >>>>> Thank you, >>>>> Saliya >>>>> >>>>> >>>>> -- >>>>> Saliya Ekanayake >>>>> Ph.D. Candidate | Research Assistant >>>>> School of Informatics and Computing | Digital Science Center >>>>> Indiana University, Bloomington >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing listus...@open-mpi.org >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/06/29564.php >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2016/06/29565.php >>>>> >>>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2016/06/29567.php >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <javascript:_e(%7B%7D,'cvml','us...@open-mpi.org');> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/06/29569.php >> > > > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > >