We started having a problem with OpenMPI beginning with version 1.3.2 where the 
program output can be correct, junk, or NaNs (result is not predictable). The 
output is the solution of a matrix equation solved by ScaLAPACK. We are using 
the Intel Fortran compiler (version 11.1) and the GNU compiler (version 4.1.2) 
on Gentoo Linux. So far, the problem manifests itself for a matrix (N X N) with 
N ~ 10,000 or more with a processor count ~ 64 or more. Note that the problem 
still occurs using OpenMPI 1.4.1.

We build the ScaLAPACK and BLACS libraries locally and use the LAPACK and BLAS 
libraries supplied by Intel.

We wrote a test program to demonstrate the problem. The matrix is built on each 
processor (no communication). Then, the matrix is factored and solved. The 
solution vector is collected from the processors and printed to a file by the 
master processor. The program and associated OpenMPI information (ompi_info 
--all) are available at:

http://www.em-stuff.com/files/files.tar.gz

The file "compile" in the "test" directory is used to create the executable. 
Edit it to reflect libraries on your local machine. Data created using OpenMPI 
1.3.1 and 1.4.1 are in the "output" directory for reference.

We appreciate any help.

Thanks,
Nathan

Reply via email to