I have a weird problem that shows up when i use LAM or OpenMPI but not MPICH.
I have a parallelized code working on a really large matrix. It partitions the matrix column-wise and ships them off to processors, so, any given processor is working on a matrix with the same number of rows as the original but reduced number of columns. Each processor needs to send a single column vector entry from its own matrix to the adjacent processor and visa versa as part of the algorithm. I have found that depending on the number of rows of the matrix -or, the size of the vector being sent using MPI_Send, MPI_Recv, the simulation will hang. It is only until i reduce this dimension to a certain max number will the sim run properly. I have also found that this magic number differs depending on the system I am using, eg my home quad-core box or remote cluster. As i mentioned i have not had this issue with mpich. I would like to understand why it is happening rather than just defect over to mpich to get by. Any help would be appreciated! zach