> So, you mean that it guarantees the value received after the bcast call is > consistent with value sent from root, but it doesn't have to wait till all > the ranks have received it? > > this is what i believe, double checking the standard might not hurt though > ... >
No function has barrier semantics, except a barrier, although some functions have barrier semantics due to data-dependencies for non-zero counts (allgather, alltoall, allreduce). Reduce, Bcast, gather, and scatter should never have barrier semantics and should not synchronize more than the explicit data decencies require. The send-only ranks may return long before the recv-only ranks do, particularly when the messages go via an eager protocol. One can imagine barrier as a 1-byte allreduce, but there are more efficient implantations. Allreduce should never be faster than Bcast, as Gilles explained. There's a nice paper on self-consistent performance of MPI implementations that has lots of details. Jeff -- Jeff Hammond jeff.scie...@gmail.com http://jeffhammond.github.io/