Vincent Loechner wrote:
Bill,
  
A rather stable production code that has worked with various versions of MPI
on various architectures started hanging with gcc-4.4.2 and openmpi 1.3.33
    
Probably this bug :
https://svn.open-mpi.org/trac/ompi/ticket/2043

Waiting for a correction, try adding this option to mpirun :
-mca btl_sm_num_fifos 5
Bill, I noticed you updated the ticket.  Thank you.  I've been working on this in earnest.  Something funny is going on as far as the "memory model" goes:  values when writing to the shared-memory FIFOs go goofy.  Like a FIFO slot that was initialized to be free and still "should be" free, looks occupied when a writer checks, but it's empty immediately thereafter even though no one "presumably" has accessed that location.  I almost have a stand-alone program (C only, no OMPI infrastructure) that demonstrates the problem, but I'm not quite there.  Then, it'll either become evident to me what's wrong or I'll be able to show other people more easily why I think something is wrong.  At this point, I really have no idea if the problem is GCC 4.4.x or OMPI 1.3.x.

Reply via email to