Hi,
I am running an MPI program where N processes write to a single file on a single shared memory machine. I’m using OpenMPI v.4.0.2. Each MPI process write a 1MB chunk of data for 1K times sequentially. There is no overlap in the file between any of the two MPI processes. I ran the program for -np = {1, 2, 4, 8}. I am seeing that the speed of the collective write to a file for -np = {2, 4, 8} never exceeds the speed of -np = {1}. I did the experiment with a few different file systems {local disk, ram disk, Luster FS}. For all of them, I see similar results. The speed of collective write to a single shared file never exceeds the speed of single MPI process case. Any tip or suggestions? I used MPI_File_write_at() routine with proper offset for each MPI process. (I also tried MPI_File_write_at_all() routine, which makes the performance worse as np gets bigger.) Before writing, MPI_Barrrier() is used. The start time is taken right after MPI_Barrier() using MPI_Timer(); The end time is taken right after another MPI_Barrier(). The speed of the collective write is calculate as (total data amount written to the file)/(time between the first MPI_Barrier() and the second MPI_Barrier()); Any idea to increase the speed? Thanks, David