Yes, this does sound like the classic "assuming MPI buffering" case. Check out this magazine column that I wrote a long time ago about this topic:

It's #1 on the top 10 list of All-Time Favorite Evils to Avoid in Parallel. :-)

One comment on Mattijs's email: please don't use bsend. Bsend is evil. :-)

On Jun 13, 2008, at 5:27 AM, Mattijs Janssens wrote:

Sounds like a typical deadlock situation. All processors are waiting for one

Not a specialist but from what I know if the messages are small enough they'll be offloaded to kernel/hardware and there is no deadlock. That why it might
work for small messages and/or certain mpi implementations.

- come up with a global communication schedule such that if one processor
sends the receiver is receiving.
- use mpi_bsend. Might be slower.
- use mpi_isend, mpi_irecv (but then you'll have to make sure the buffers stay
valid for the duration of the communication)

On Friday 13 June 2008 01:55, zach wrote:
I have a weird problem that shows up when i use LAM or OpenMPI but not

I have a parallelized code working on a really large matrix. It
partitions the matrix column-wise and ships them off to processors,
so, any given processor is working on a matrix with the same number of
rows as the original but reduced number of columns. Each processor
needs to send a single column vector entry
from its own matrix to the adjacent processor and visa versa as part
of the algorithm.

I have found that depending on the number of rows of the matrix -or,
the size of the vector being sent using MPI_Send, MPI_Recv, the
simulation will hang.
It is only until i reduce this dimension to a certain max number will
the sim run properly. I have also found that this magic number differs depending on the system I am using, eg my home quad-core box or remote

As i mentioned i have not had this issue with mpich. I would like to
understand why it is happening rather than just defect over to mpich
to get by.

Any help would be appreciated!
users mailing list

users mailing list

Jeff Squyres
Cisco Systems

Reply via email to