Kind of sounds to me like they are using the wrong proc when receiving. Here is an example of what a modex receive should look like:https://github.com/open-mpi/ompi/blob/main/opal/mca/btl/ugni/btl_ugni_endpoint.c#L44-NathanOn Aug 3, 2022, at 11:29 AM,
"Jeff Squyres (jsquyres) via devel"
Glad you solved the first issue!
With respect to debugging, if you don't have a parallel debugger, you can do
something like this:
https://www.open-mpi.org/faq/?category=debugging#serial-debuggers
If you haven't done so already, I highly suggest configuring Open MPI with
"CFLAGS=-g -O0".
As
thank you for the answer. Actually I think I solved that problem some
days ago, basically (if I correctly understand) MPI "adds" in some sense
an header to the data sent (please correct me if I'm wrong), which is
then used by ob1 to match the data arrived with the mpi_recv posted by
the user.
Sorry for the huge delay in replies -- it's summer / vacation season, and I
think we (as a community) are a little behind in answering some of these
emails. :-(
It's been quite a while since I have been in the depths of BTL internals; I'm
afraid I don't remember the details offhand.
When I
Sorry for the delay in replies -- it's summer / vacation season, and I think we
(as a community) are a little behind in answering some of these emails. :-(
It's hard to say for any given machine, but a bunch of different hardware
factors can come into play, such as:
- L1, L2, L3 cache sizes
-