Hi, A source of sudden deadlocks at larger scale can be a change of send behavior from buffered to synchronous mode. You can try whether your application deadlocks at smaller scale, if you replace all send by ssend (e.g., add`#define MPI_Send MPI_Ssend` and `#define MPI_Isend MPI_Issend` after the include of the MPI header). An application with correct communication pattern should run with synchronous send without deadlock. To check for other deadlock pattern in your application you can use tools like MUST [1] or Totalview.
Best Joachim [1] https://itc.rwth-aachen.de/must/ ________________________________ From: users <users-boun...@lists.open-mpi.org> on behalf of George Bosilca via users <users@lists.open-mpi.org> Sent: Sunday, September 11, 2022 10:40:42 PM To: Open MPI Users <users@lists.open-mpi.org> Cc: George Bosilca <bosi...@icl.utk.edu> Subject: Re: [OMPI users] Subcommunicator communications do not complete intermittently Assuming a correct implementation the described communication pattern should work seamlessly. Would it be possible to either share a reproducer or provide the execution stack by attaching a debugger to the deadlocked application to see the state of the different processes. I wonder if all processes join eventually the gather on comm_world or dinner of them are stuck on some orthogonal collective communication pattern. George On Fri, Sep 9, 2022, 21:24 Niranda Perera via users <users@lists.open-mpi.org<mailto:users@lists.open-mpi.org>> wrote: Hi all, I have the following use case. I have N mpi ranks in the global communicator, and I split it into two, first being rank 0, and the other being all ranks from 1-->N-1. Rank0 acts as a master and ranks [1, N-1] act as workers. I use rank0 to broadcast (blocking) a set of values to ranks [1, N-1] ocer comm_world. Rank0 then immediately calls a gather (blocking) over comm_world and busywait for results. Once the broadcast is received by workers, they call a method foo(args, local_comm). Inside foo, workers communicate with each other using the subcommunicator, and each produce N-1 results, which would be sent to Rank0 as gather responses over comm_world. Inside foo there are multiple iterations, collectives, send-receives, etc. This seems to be working okay with smaller parallelism and smaller tasks of foo. But when the parallelism increases (eg: 64... 512), only a single iteration completes inside foo. Subsequent iterations, seems to be hanging. Is this an anti-pattern in MPI? Should I use igather, ibcast instead of blocking calls? Any help is greatly appreciated. -- Niranda Perera https://niranda.dev/ @n1r44<https://twitter.com/N1R44>