On Sat, Aug 16, 2008 at 08:05:14AM -0400, Jeff Squyres wrote:
> On Aug 13, 2008, at 7:06 PM, Yvan Fournier wrote:
>
>> I seem to have encountered a bug in MPI-IO, in which
>> MPI_File_get_position_shared hangs when called by multiple processes  
>> in
>> a communicator. It can be illustrated by the following simple test  
>> case,
>> in which a file is simply created with C IO, and opened with MPI-IO.
>> (defining or undefining MY_MPI_IO_BUG on line 5 enables/disables the
>> bug). From the MPI2 documentation, It seems that all processes should 
>> be
>> able to call MPI_File_get_position_shared, but if more than one  
>> process
>> uses it, it fails. Setting the shared pointer helps, but this should  
>> not
>> be necessary, and the code still hangs (in more complete code, after
>> writing data).
>>
>> I encounter the same problem with Open MPI 1.2.6 and MPICH2 1.0.7, so
>> I may have misread the documentation, but I suspect a ROMIO bug.
>
> Bummer.  :-(
>
> It would be best to report this directly to the ROMIO maintainers via 
> romio-ma...@mcs.anl.gov.  They lurk on this list, but they may not be 
> paying attention to every mail.

Hi, that would be me, and yup, as you can see I don't check in too
often.  

Just to wrap this up, I'm glad you found workarounds.  Shared file
pointers have a certain seductive quality about them, but they are a
pain in the neck to implement in the library.  

You will almost assuredly scale to larger numbers of processors and
achieve higher I/O bandwidth if you do just a little bit of work.
Keep track of file offsets on your own and instead of doing
independent I/O use MPI_File_read_at_all or MPI_File_write_at_all.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division    A215 0178 EA2D B059 8CDF
Argonne National Lab, IL USA                 B29D F333 664A 4280 315B

Reply via email to