Yes. This is absolutely normal. When you give MPI non-contiguous data it has to 
break out down into one operation per contiguous region. If you have a non-RDMA 
network Ross can lead to very poor performance. With RDMA networks it will also 
be much slower than a contiguous get but lower overhead per network operation.

-Nathan

> On Mar 30, 2023, at 10:43 AM, Antoine Motte via users 
> <users@lists.open-mpi.org> wrote:
> 
> 
> Hello everyone,
> 
> I recently had to code an MPI application where I send std::vector contents 
> in a distributed environment. In order to try different approaches I coded 
> both 1-sided and 2-sided point-to-point communication schemes, the first one 
> uses MPI_Window and MPI_Get, the second one uses MPI_SendRecv.
> 
> I had a hard time figuring out why my implementation with MPI_Get was between 
> 10 and 100 times slower, and I finally found out that MPI_Get is abnormally 
> slow when one tries to send custom datatypes including padding.
> 
> Here is a short example attached, where I send a struct {double, int} (12 
> bytes of data + 4 bytes of padding) vs a struct {double, int, int} (16 bytes 
> of data, 0 bytes of padding) with both MPI_SendRecv and MPI_Get. I got these 
> results :
> 
> mpirun -np 4 ./compareGetWithSendRecv 
> {double, int} SendRecv : 0.0303547 s
> {double, int} Get : 1.9196 s
> {double, int, int} SendRecv : 0.0164659 s
> {double, int, int} Get : 0.0147757 s
> 
> I run it with both Open MPI 4.1.2 and with intel MPI 2021.6 and got the same 
> results.
> 
> Is this result normal? Do I have any solution other than adding garbage at 
> the end of the struct or at the end of the MPI_Datatype to avoid padding?
> 
> Regards,
> 
> Antoine Motte
> 
> <compareGetWithSendRecv.cpp>

Reply via email to