Hi Timur, I don't think this is apples-to-apples comparison.
In OpenSHMEM world "MPI_waitall" would be mapped to shmem_quiet(). Even with this mapping, shmem_quiet() has a *stronger* completion semantics if you compare it to MPI_waitall. Quiet guarantees that the data was delivered to a remote memory, while MPI_waitall does not provide such guarantee for isend operations. shmem_barrier_all is a collective operation with embedded shmem_quiet therefore it will not scale the same as MPI_waitall. For more details please see: Please see http://bongo.cs.uh.edu/site/sites/default/site_files/openshmem-specification-1.1.pdf section 8.7.3 I hope it helps. Pavel (Pasha) Shamis --- Computer Science Research Group Computer Science and Math Division Oak Ridge National Laboratory On Aug 29, 2014, at 5:26 AM, Timur Ismagilov <tismagi...@mail.ru> wrote: > Hello! > > What param can i tune to increase perfomance(scalability) for my app (all to > all pattern with message size = constant/nnodes)? > I can read this faq for mpi, but is it correct for shmem? > > I have 2 programm doing the same thing(with same input) each node send > messages(message size = constant/nnodes) to random set of nodes (but the same > set in prg1 and prg2): > > • with mpi_isend, mpi_irecv and mpi_waitall > • with shmem_put and shmem_barrier_all > on 1 2 4 8 16 32 nodes thay have same perfomance(scalabilyty) > on 64 128 256 nodes shmem programm stop scaling but over 512 nodes shmem > programm gets much better perfomance than mpi > 1prg 2prg > perf unit perf unit > 1 30 30 > 2 50 53 > 4 75 85 > 8 110 130 > 16 180 200 > 32 310 350 > 64 500 400 (strange) > 128 830 400 (strange) > 256 1350 600 (strange) > 512 1770 2350 (wow!) > > In scalabel shmem(ompi 1.6.5?) I get the same scalability in this programms. > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/08/25185.php