Hi Nathan, Joseph,
Thank you for your quick answers, I also noticed bad performance of
MPI_Get when there are displacements in the datatype, not necessarily
padding. So I'll keep in mind to declare the padding in my MPI_Datatype
to allow MPI to copy it, and make the whole set of data contiguou
That is exactly the issue. Part of the reason I have argued against MPI_SHORT_INT
usage in RMA because even though it is padded due to type alignment we are still not
allowed to operate on the bits between the short and the int. We can correct that one
in the standard by adding the same languag
Hi Antoine,
That's an interesting result. I believe the problem with datatypes with
gaps is that MPI is not allowed to touch the gaps. My guess is that for
the RMA version of the benchmark the implementation either has to revert
back to an active message packing the data at the target and send
Yes. This is absolutely normal. When you give MPI non-contiguous data it has to
break out down into one operation per contiguous region. If you have a non-RDMA
network Ross can lead to very poor performance. With RDMA networks it will also
be much slower than a contiguous get but lower overhead