Hello,

I'm using a custom datatype created through MPI_Type_create_struct() to
send data with a dynamic structure to another process on the same node over
shared memory, and noticed it's much slower than expected.

I ran a profile, and it looks like it's not using CMA zero-copy, falling
back to using opal_generic_simple_pack()/opal_generic_simple_unpack().
Simpler datatypes do seem to use zero-copy, using mca_btl_vader_get_cma(),
so I don't think it's a configuration or system issue.

I suspect it's because the struct datatype is not contiguous, i.e. the
blocks of the struct have gaps between them.
Is anyone able to confirm whether zero-copy with an MPI struct requires a
contiguous data structure, and whether it has other requirements like the
displacements being in ascending order, having homogeneous block
datatypes/lengths, etc?

I'm using OpenMPI 4.1.6.

Thanks,
Pascal Boeschoten

Reply via email to