Hello, I'm using a custom datatype created through MPI_Type_create_struct() to send data with a dynamic structure to another process on the same node over shared memory, and noticed it's much slower than expected.
I ran a profile, and it looks like it's not using CMA zero-copy, falling back to using opal_generic_simple_pack()/opal_generic_simple_unpack(). Simpler datatypes do seem to use zero-copy, using mca_btl_vader_get_cma(), so I don't think it's a configuration or system issue. I suspect it's because the struct datatype is not contiguous, i.e. the blocks of the struct have gaps between them. Is anyone able to confirm whether zero-copy with an MPI struct requires a contiguous data structure, and whether it has other requirements like the displacements being in ascending order, having homogeneous block datatypes/lengths, etc? I'm using OpenMPI 4.1.6. Thanks, Pascal Boeschoten