On Mon, 2011-08-22 at 10:23 -0500, Rob Latham wrote:
> On Thu, Aug 18, 2011 at 08:46:46AM -0700, Tom Rosmond wrote:
> > We have a large fortran application designed to run doing IO with either
> > mpi_io or fortran direct access.  On a linux workstation (16 AMD cores)
> > running openmpi 1.5.3 and Intel fortran 12.0 we are having trouble with
> > random failures with the mpi_io option which do not occur with
> > conventional fortran direct access.  We are using ext3 file systems, and
> > I have seen some references hinting of similar problems with the
> > ext3/mpiio combination.  The application with the mpi_io option runs
> > flawlessly on Cray architectures with Lustre file systems, so we are
> > also suspicious of the ext3/mpiio combination.  Does anyone else have
> > experience with this combination that could shed some light on the
> > problem, and hopefully some suggested solutions?
> 
> I'm glad to hear you're having success with mpi-io on Cray/Lustre.
> That platform was a bit touchy for a while, but has gotten better over
> the last two years.
> 
> My first guess would be that your linux workstation does not implement
> a "strict enough" file system lock.  ROMIO relies on the "fcntl" locks
> to provide exclusive access to files at some points in the code.  
> 
> Does your application use collective I/O ?  It sounds like if you can
> swap fortran and mpi-io so easily that maybe you do not.  If there's
> a way to make collective MPI-IO calls, that will eliminate many of the
> fcntl lock calls.  
> 
Rob

Yes, we are using collective I/O (mpi_file_write_at_all,
mpi_file_read_at_all).  The swaping of fortran and mpi-io are just
branches in the code at strategic locations.  Although the mpi-io files
are readable with fortran direct access, we don't do so from within the
application because of different data organization in the files.  

> Do you use MPI datatypes to describe either a file view or the
> application data?   These noncontiguous in memory and/or noncontiguous
> in file access patterns will also trigger fcntl lock calls.  You can
> use an MPI-IO hint to disable data sieving, at a potentially
> disastrous performance cost. 

Yes, we use an 'mpi_type_indexed' datatype to describe the data
organization.  

Any thoughts about the XFS vs EXT3 question?

Thanks for the help

T. Rosmond


> 
> ==rob
> 

Reply via email to