I have a weird problem that shows up when i use LAM or OpenMPI but not MPICH.

I have a parallelized code working on a really large matrix. It
partitions the matrix column-wise and ships them off to processors,
so, any given processor is working on a matrix with the same number of
rows as the original but reduced number of columns. Each processor
needs to send a single column vector entry
from its own matrix to the adjacent processor and visa versa as part
of the algorithm.

I have found that depending on the number of rows of the matrix -or,
the size of the vector being sent using MPI_Send, MPI_Recv, the
simulation will hang.
It is only until i reduce this dimension to a certain max number will
the sim run properly. I have also found that this magic number differs
depending on the system I am using, eg my home quad-core box or remote
cluster.

As i mentioned i have not had this issue with mpich. I would like to
understand why it is happening rather than just defect over to mpich
to get by.

Any help would be appreciated!
zach

Reply via email to