Thank you for your suggestion.
I have more information and possible explanation, and more questions.
It looks like that NUMA plays a big role here.
In summary, it looks like that synchronization overhead of MPI file I/O
among "socket" is a lot higher than the overhead among the processes within
a s
I hope this replies correctly. I previously had a problem with replies.
Anyhow, thank you for the advice. It turns out NUMA was disabled in the BIOS.
All other nodes showed 2 NUMA nodes but node125 showed 1 NUMA node. I was able
to see this by diffing lscpu on node125 and another node. Afte