Thanks again Damian,
I think the root cause is we call mca_topo_base_neighbor_count() instead of
ompi_comm_size() here.
It seems the implicit assumption is one would call
MPI_Ineighbor_alltoallw() on a cartesian communicator ... which is
obviously wrong: it is legit to call MPI_Ialltoallw(), even
Hello,
I made an example that triggers the issue. I had to get a little creative with
how to trigger the crash, since it does not appear if the memory allocated for
the send and recv types happens to be set to 0 (although valgrind still reports
an invalid read).
Communicator is an intra-commun
Damien,
As Gilles indicated an example would be great. Meanwhile, as you already
have access to the root cause with a debugger, can you check what branch of
the if regarding the communicator type in the
ompi_coll_base_retain_datatypes_w function is taken. What is the
communicator type ? Intra or i
Damian,
Thanks for the report!
could you please trim your program and share it so I can have a look?
Cheers,
Gilles
On Wed, May 4, 2022 at 10:27 PM Damian Marek via devel <
devel@lists.open-mpi.org> wrote:
> Hello,
>
> I have been getting intermittent memory corruptions and segmentation
> f
Hello,
I have been getting intermittent memory corruptions and segmentation faults
while using Ialltoallw in OpenMPI v4.0.3. Valgrind also reports an invalid read
in the "ompi_coll_base_retain_datatypes_w" function defined in
"coll_base_util.c".
Running with a debug build of ompi an assertion
We discussed this on the OMPI call yesterday, but I am not knowledgeable of the
consequences of only supporting building Open MPI main and v5.x against PMIx
v4.x are (vs. also supporting building Open MPI main+v5.x against PMIx v3.2.x,
or even PMIx v3.x).
I think we might need to talk to the PM