Rob Latham wrote:
If the other processors need header data, perhaps rank 0 can broadcast it to everyone else?
That's what happens, when *reading* that file back to continue number-crunching. When writing, rank 0 is also the only one to write a header, and then each rank writes their parts of the matrices.
Is it OK to mention MPICH2 on this list? I did prototype some MPI extensions that allowed ROMIO to do true async I/O (at least as far as the underlying operating system supports it). If you really need to experiment with async I/O, I'd love to hear your experiences.
I don't really need to, I think. I can't say I have had any experience either. It would have been a great demo of real-world implementation and performance, but the convenience of the APIs themselves is good enough. I'll try and see how much split-collective operations will bring, performance-wise.
Thanks