Re: [OMPI users] Deadlock in MPI_File_write_all on Infiniband

2009-10-13 Thread Dorian Krause
Hi Edgar, this sounds reasonable. Looking at the program with the debugger, I can see that 15/16 processes wait in PMPI_Allreduce whereas the other one is in PMPI_Wait. Also, the program works with mvapich and I guess the ADIO source tree is more or less the same (correct me if I'm wrong)?!

Re: [OMPI users] Deadlock in MPI_File_write_all on Infiniband

2009-10-12 Thread Edgar Gabriel
I am wondering whether this is really due to the usage of File_write_all. We had a bug in in 1.3 series so far (which will be fixed in 1.3.4) where we lost message segments and thus had a deadlock in Comm_dup if there was communication occurring *right after* the Comm_dup. File_open executes a

[OMPI users] Deadlock in MPI_File_write_all on Infiniband

2009-10-12 Thread Dorian Krause
Dear list, the attached program deadlocks in MPI_File_write_all when run with 16 processes on two 8 core nodes of an Infiniband cluster. It runs fine when I a) use tcp or b) replace MPI_File_write_all by MPI_File_write I'm using openmpi V. 1.3.2 (but I checked that the problem is also occurs