On Sat, Aug 16, 2008 at 08:05:14AM -0400, Jeff Squyres wrote: > On Aug 13, 2008, at 7:06 PM, Yvan Fournier wrote: > >> I seem to have encountered a bug in MPI-IO, in which >> MPI_File_get_position_shared hangs when called by multiple processes >> in >> a communicator. It can be illustrated by the following simple test >> case, >> in which a file is simply created with C IO, and opened with MPI-IO. >> (defining or undefining MY_MPI_IO_BUG on line 5 enables/disables the >> bug). From the MPI2 documentation, It seems that all processes should >> be >> able to call MPI_File_get_position_shared, but if more than one >> process >> uses it, it fails. Setting the shared pointer helps, but this should >> not >> be necessary, and the code still hangs (in more complete code, after >> writing data). >> >> I encounter the same problem with Open MPI 1.2.6 and MPICH2 1.0.7, so >> I may have misread the documentation, but I suspect a ROMIO bug. > > Bummer. :-( > > It would be best to report this directly to the ROMIO maintainers via > romio-ma...@mcs.anl.gov. They lurk on this list, but they may not be > paying attention to every mail.
Hi, that would be me, and yup, as you can see I don't check in too often. Just to wrap this up, I'm glad you found workarounds. Shared file pointers have a certain seductive quality about them, but they are a pain in the neck to implement in the library. You will almost assuredly scale to larger numbers of processors and achieve higher I/O bandwidth if you do just a little bit of work. Keep track of file offsets on your own and instead of doing independent I/O use MPI_File_read_at_all or MPI_File_write_at_all. ==rob -- Rob Latham Mathematics and Computer Science Division A215 0178 EA2D B059 8CDF Argonne National Lab, IL USA B29D F333 664A 4280 315B