Adam -

There are a couple of theoretical limits on how many requests you can have 
outstanding (at some point, you will run the host out of memory).  However, 
those issues should be a problem when posting the MPI_Isend or MPI_Irecv, not 
during MPI_Waitall.  2.1.0 is pretty old; the first step in further debugging 
is to upgrade to one of the recent releases (3.1.3 or 4.0.0) and verify that 
the bug still exists.

Brian

> On Dec 16, 2018, at 6:52 AM, Adam Sylvester <op8...@gmail.com> wrote:
> 
> I'm running OpenMPI 2.1.0 on RHEL 7 using TCP communication.  For the 
> specific run that's crashing on me, I'm running with 17 ranks (on 17 
> different physical machines).  I've got a stage in my application where ranks 
> need to transfer chunks of data where the size of each chunk is trivial (on 
> the order of 100 MB) compared to the overall imagery.  However, the chunks 
> are spread out across many buffers in a way that makes the indexing 
> complicated (and the memory is not all within a single buffer)... the 
> simplest way to express the data movement in code is by a large number of 
> MPI_Isend() and MPI_Ireceive() calls followed of course by an eventual 
> MPI_Waitall().  This works fine for many cases, but I've run into a case now 
> where the chunks are imbalanced such that a few ranks have a total of ~450 
> MPI_Request objects (I do a single MPI_Waitall() with all requests at once) 
> and the remaining ranks have < 10 MPI_Requests.  In this scenario, I get a 
> seg fault inside PMPI_Waitall().
> 
> Is there an implementation limit as to how many asynchronous requests are 
> allowed?  Is there a way this can be queried either via a #define value or 
> runtime call?  I probably won't go this route, but when initially compiling 
> OpenMPI, is there a configure option to increase it?
> 
> I've done a fair amount of debugging and am pretty confident this is where 
> the error is occurring as opposed to indexing out of bounds somewhere, but if 
> there is no such limit in OpenMPI, that would be useful to know too.
> 
> Thanks.
> -Adam
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to