Hi, A couple of comments. First, if you use MPI_File_write_at, this is usually not considered collective I/O, even if executed by multiple processes. MPI_File_write_at_all would be collective I/O.
Second, MPI I/O can not do ‘magic’, but is bound by hardware that you are providing. If already a single process is able to saturate the bandwidth of your file system and hardware, you will not be able to see performance improvements from multiple processes (some minor exceptions maybe due to caching effects, but that is only for smaller problem sizes, the larger the amount of data that you try to write, the lesser the caching effects become in file I/O). So the first question that you have to answer, what is the sustained bandwidth of your hardware, and are you able to saturate it already with a single process. If you are using a single hard drive (or even 2 or 3 hard drives in a RAID 0 configuration), this is almost certainly the case. Lastly, the configuration parameters of your tests also play a major role. As a general rule, the larger amounts of data you are able to provide per file I/O call, the better the performance will be. 1MB of data per call is probably on the smaller side. The ompio implementation of MPI I/O breaks large individual I/O operations (e.g. MPI_File_write_at) into chunks of 512MB for performance reasons internally. Large collective I/O operations (e.g. MPI_File_write_at_all) are broken into chunks of 32 MB. This gives you some hints on the quantities of data that you would have to use for performance reasons. Along the same lines, one final comment. You say you did 1000 writes of 1MB each. For a single process that is about 1GB of data. Depending on how much main memory your PC has, this amount of data can still be cached in modern systems, and you might have an unrealistically high bandwidth value for the 1 process case that you are comparing against (it depends a bit on what your benchmark does, and whether you force flushing the data to disk inside of your measurement loop). Hope this gives you some pointers on where to start to look. Thanks Edgar From: users <users-boun...@lists.open-mpi.org> On Behalf Of Dong-In Kang via users Sent: Monday, April 6, 2020 7:14 AM To: users@lists.open-mpi.org Cc: Dong-In Kang <dik...@gmail.com> Subject: [OMPI users] Slow collective MPI File IO Hi, I am running an MPI program where N processes write to a single file on a single shared memory machine. I’m using OpenMPI v.4.0.2. Each MPI process write a 1MB chunk of data for 1K times sequentially. There is no overlap in the file between any of the two MPI processes. I ran the program for -np = {1, 2, 4, 8}. I am seeing that the speed of the collective write to a file for -np = {2, 4, 8} never exceeds the speed of -np = {1}. I did the experiment with a few different file systems {local disk, ram disk, Luster FS}. For all of them, I see similar results. The speed of collective write to a single shared file never exceeds the speed of single MPI process case. Any tip or suggestions? I used MPI_File_write_at() routine with proper offset for each MPI process. (I also tried MPI_File_write_at_all() routine, which makes the performance worse as np gets bigger.) Before writing, MPI_Barrrier() is used. The start time is taken right after MPI_Barrier() using MPI_Timer(); The end time is taken right after another MPI_Barrier(). The speed of the collective write is calculate as (total data amount written to the file)/(time between the first MPI_Barrier() and the second MPI_Barrier()); Any idea to increase the speed? Thanks, David