Re: [OMPI users] file/process write speed is not scalable

Patrick Bégou via users Tue, 14 Apr 2020 04:33:03 -0700

Hi David,

could you specify which version of OpenMPI you are using ?
I've also some parallel I/O trouble with one code but still have not
investigated.
Thanks


Patrick

Le 13/04/2020 à 17:11, Dong-In Kang via users a écrit :
>
>  Thank you for your suggestion.
> I am more concerned about the poor performance of one MPI
> process/socket case.
> The model fits better for my real workload.
> The performance that I see is a lot worse than what the underlying
> hardware can support.
> The best case (all MPI processes in a single socket) is pretty good,
> which is about 80+% of underlying hardware's speed.
> However, one MPI per socket model achieves only 30% of what I get with
> all MPI processes in a single socket.
> Both are doing the same thing - independent file write.
> I used all the OSTs available.
>
> As a reference point, I did the same test on ramdisk.
> For both case, the performance scales very well, and their
> performances are close.
>
> There seems to be extra overhead when multi-sockets are used for
> independent file I/O with Lustre.
> I don't know what causes that overhead.
>
> Thanks,
> David
>
>
> On Thu, Apr 9, 2020 at 11:07 PM Gilles Gouaillardet via users
> <[email protected] <mailto:[email protected]>> wrote:
>
>     Note there could be some NUMA-IO effect, so I suggest you compare
>     running every MPI tasks on socket 0, to running every MPI tasks on
>     socket 1 and so on, and then compared to running one MPI task per
>     socket.
>
>     Also, what performance do you measure?
>     - Is this something in line with the filesystem/network expectation?
>     - Or is this much higher (and in this case, you are benchmarking
>     the i/o cache)?
>
>     FWIW, I usually write files whose cumulated size is four times the
>     node memory to avoid local caching effect
>     (if you have a lot of RAM, that might take a while ...)
>
>     Keep in mind Lustre is also sensitive to the file layout.
>     If you write one file per task, you likely want to use all the
>     available OST, but no stripping.
>     If you want to write into a single file with 1MB blocks per MPI task,
>     you likely want to stripe with 1MB blocks,
>     and use the same number of OST than MPI tasks (so each MPI task ends
>     up writing to its own OST)
>
>     Cheers,
>
>     Gilles
>
>     On Fri, Apr 10, 2020 at 6:41 AM Dong-In Kang via users
>     <[email protected] <mailto:[email protected]>> wrote:
>     >
>     > Hi,
>     >
>     > I'm running IOR benchmark on a big shared memory machine with
>     Lustre file system.
>     > I set up IOR to use an independent file/process so that the
>     aggregated bandwidth is maximized.
>     > I ran N MPI processes where N < # of cores in a socket.
>     > When I put those N MPI processes on a single socket, its write
>     performance is scalable.
>     > However, when I put those N MPI processes on N sockets (so, 1
>     MPI process/socket),
>     > it performance does not scale, and stays the same for more than
>     4 MPI processes.
>     > I expected it would be as scalable as the case of N processes on
>     a single socket.
>     > But, it is not.
>     >
>     > I think if an MPI process write to an independent file/process,
>     there must not be file locking among MPI processes. However, there
>     seems to be some. Is there any way to avoid that locking or
>     overhead? It may not be file lock issue, but I don't know what is
>     the exact reason for the poor performance.
>     >
>     > Any help will be appreciated.
>     >
>     > David
>

Re: [OMPI users] file/process write speed is not scalable

Reply via email to