Follow-up:  I misread the coding, so now I think mpi_iprobe is probably not 
being used for this case.  I'll have to pin the blame somewhere else.  -fPIC 
definitely fixes the problem, as I tried removing -mcmodel=medium and it still 
worked.   Our usual communication pattern is mpi_irecv, mpi_isend, mpi_waitall; 
perhaps there is something unhealthy in the semantics there.

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Wednesday, September 21, 2011 10:44 AM
To: Open MPI Users
Subject: EXTERNAL: [OMPI users] Question about compilng with fPIC

Follow-up to a mislabeled thread:  "How could OpenMPI (or MVAPICH) affect 
floating-point results?"

I have found a solution to my problem, but I would like to understand the 
underlying issue better.

To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked 
with OpenMPI fails.  The earliest symptom I could see was some strange 
difference in numerical values of quantities that should be unaffected by MPI 
calls.  Tim's advice guided me to assume memory corruption. Eugene's advice 
guided me to explore the detailed differences in compilation.  

I observed that the MVAPICH mpif90 wrapper adds -fPIC.

I tried adding -fPIC and -mcmodel=medium to the compilation of the 
OpenMPI-linked executable.  Now it works fine. I haven't tried without 
-mcmodel=medium, but my guess is -fPIC did the trick.

Does anyone know why compiling with -fPIC has helped?  Does it suggest an 
application problem or an OpenMPI problem?

To note: This is an Infiniband-based cluster.  The application does pretty 
basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, 
isend, irecv, waitall.  There is one task that uses iprobe with MPI_ANY_TAG, 
but this task is only involved in certain cases (including this one). 
Conversely, cases that do not call iprobe have not yet been observed to crash.  
I am deducing that this function is the problem.

Thanks,

Ed

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Blosch, Edwin L
Sent: Tuesday, September 20, 2011 11:46 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect 
floating-point results?

Thank you for this explanation.  I will assume that my problem here is some 
kind of memory corruption.


-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Tim Prince
Sent: Tuesday, September 20, 2011 10:36 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect 
floating-point results?

On 9/20/2011 10:50 AM, Blosch, Edwin L wrote:

> It appears to be a side effect of linkage that is able to change a 
> compute-only routine's answers.
>
> I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind 
> of corruption may be going on.
>

Those intrinsics have direct instruction set translations which 
shouldn't vary from -O1 on up nor with linkage options nor be affected 
by MPI or insertion of WRITEs.

-- 
Tim Prince
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to