On Thu, Aug 18, 2011 at 08:46:46AM -0700, Tom Rosmond wrote: > We have a large fortran application designed to run doing IO with either > mpi_io or fortran direct access. On a linux workstation (16 AMD cores) > running openmpi 1.5.3 and Intel fortran 12.0 we are having trouble with > random failures with the mpi_io option which do not occur with > conventional fortran direct access. We are using ext3 file systems, and > I have seen some references hinting of similar problems with the > ext3/mpiio combination. The application with the mpi_io option runs > flawlessly on Cray architectures with Lustre file systems, so we are > also suspicious of the ext3/mpiio combination. Does anyone else have > experience with this combination that could shed some light on the > problem, and hopefully some suggested solutions?
I'm glad to hear you're having success with mpi-io on Cray/Lustre. That platform was a bit touchy for a while, but has gotten better over the last two years. My first guess would be that your linux workstation does not implement a "strict enough" file system lock. ROMIO relies on the "fcntl" locks to provide exclusive access to files at some points in the code. Does your application use collective I/O ? It sounds like if you can swap fortran and mpi-io so easily that maybe you do not. If there's a way to make collective MPI-IO calls, that will eliminate many of the fcntl lock calls. Do you use MPI datatypes to describe either a file view or the application data? These noncontiguous in memory and/or noncontiguous in file access patterns will also trigger fcntl lock calls. You can use an MPI-IO hint to disable data sieving, at a potentially disastrous performance cost. ==rob -- Rob Latham Mathematics and Computer Science Division Argonne National Lab, IL USA