Hi,

I'm running IOR benchmark on a big shared memory machine with Lustre file
system.
I set up IOR to use an independent file/process so that the aggregated
bandwidth is maximized.
I ran N MPI processes where N < # of cores in a socket.
When I put those N MPI processes on a single socket, its write performance
is scalable.
However, when I put those N MPI processes on N sockets (so, 1 MPI
process/socket),
it performance does not scale, and stays the same for more than 4 MPI
processes.
I expected it would be as scalable as the case of N processes on a single
socket.
But, it is not.

I think if an MPI process write to an independent file/process, there must
not be file locking among MPI processes. However, there seems to be some.
Is there any way to avoid that locking or overhead? It may not be file lock
issue, but I don't know what is the exact reason for the poor performance.

Any help will be appreciated.

David

Reply via email to