On Wed, May 22, 2013 at 12:23:36PM -0400, Eric Chamberland wrote:
> On 05/22/2013 11:33 AM, Tom Rosmond wrote:
> >Thanks for the confirmation of the MPIIO problem.  Interestingly, we
> >have the same problem when using MPIIO in INTEL MPI.  So something
> >fundamental seems to be wrong.
> >
> 
> I think but I am not sure that it is because the MPI I/O (ROMIO)
> code is the same for all distributions...
> 
> It has been written by Rob Latham.

Hello!  Rajeev wrote it when he was in grad school, then he passed the
torch to Rob Ross when he was a post-doc at Argonne, and now I've been
the caretaker for the last mumble-mumble years.  (now if i could only
find some other sucker....)

Tom, Eric:  I have recently fixed this bug for some cases.   I don't
know when OpenMPI will re-sync with ROMIO (it's getting harder and
harder to keep ROMIO as "the standalone MPI-IO implementation") but it
should be straightforward to pick up that change 

(it's this one:
http://git.mpich.org/mpich.git/blobdiff/2de997d9b3e94bad01d5f46d76f163d71e2bd7bd..7d44307f269cae96118beb19760221aff99bd74a:/src/mpi/romio/adio/common/ad_read.c)


The functional descriptions for ROMIO are indeed "integer count of
some datatype", but one can still use that to say "write a billion
doubles".

ROMIO handles this internally with as many calls to the write(2)
system call as it takes to complete.

If you try to get fancy and make a struct of three thousand
megabyte-sized MPI_CONTIG types, MPICH will blow up.  I haven't tested
against OpenMPI. 

But basic types should be ok, now.

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA

Reply via email to