Re: [OMPI users] tuning sm/vader for large messages

Gilles Gouaillardet Mon, 20 Mar 2017 17:23:03 -0700

Joshua,

George previously explained you are limited by the size of your level Xcache.

that means that you might get optimal performance for a given messagesize, let's say


when everything fits in the L2 cache.

when you increase the message size, L2 cache is too small, and you haveto move to the L3 cache,


which is obviously slower, and hence the drop in performance.

so send/recv a same small message twice might be faster than send/recvone twice larger message ...


just because of the cache size


Cheers,


Gilles


On 3/21/2017 9:11 AM, Joshua Mora wrote:

I don't want to push it up.
I just want to sustain the same bandwidth sending at that optimal size. I'd
like to see a constant bw from that size and above , not a significant drop
when I  cross a msg size.

------ Original Message ------
Received: 05:11 PM CDT, 03/20/2017
From: George Bosilca <bosi...@icl.utk.edu>
To: Joshua Mora <joshua_m...@usa.net> Cc:
Open MPI Users <users@lists.open-mpi.org>
Subject: Re: [OMPI users] tuning sm/vader for large messages

On Mon, Mar 20, 2017 at 12:45 PM, Joshua Mora  wrote:

If at certain x msg size you achieve X performance (MB/s) and at 2x msg
size
or higher you achieve Y performance, being Y significantly lower than X,
is it
possible to have a parameter that chops messages internally to x size in
order
to sustain X performance rather than let it choke ?


Unfortunately not. After a certain message size you hit the hardware memory
bandwidth limit, and no pipeline can help. To push it up you will need to
have a single copy instead of 2, but vader should do this by default as
long as KNEM or CMA are available on the machine.

   George.

sort of flow control to
avoid congestion ?
If that is possible, what would be that parameter for vader ?

Other than source code, is there any detailed documentation/studies of
vader
related parameters to improve the bandwidth at large message size ? I did
see
some documentation for sm, but not for vader.

Thanks,
Joshua


------ Original Message ------
Received: 03:06 PM CDT, 03/17/2017
From: George Bosilca
To: Joshua Mora
Cc: Open MPI Users
Subject: Re: [OMPI users] tuning sm/vader for large messages

On Fri, Mar 17, 2017 at 3:33 PM, Joshua Mora

wrote:

Thanks for the quick reply.
This test is between 2 cores that are on different CPUs. Say data has

to

traverse coherent fabric (eg. QPI,UPI, cHT).
It has to go to main memory independently of cache size. Wrong

assumption
?

Depends on the usage pattern. Some benchmarks have options to

clean/flush

the cache before each round of tests.

Can data be evicted from cache and put into cache of second core on
different
CPU without placing it first in main memory ?

It would depend on the memory coherency protocol. Usually it gets

marked

as

shared, and as a result it might not need to be pushed into main memory
right away.

I am more thinking that there is a parameter that splits large

messages

in

smaller ones at 64k or 128k ?

Pipelining is not the answer to all situations. Once your messages are
larger than the caches, you already built memory pressure (by getting
outside the cache size) so the pipelining is bound by the memory

bandwidth.

This seems (wrong assumption ?) like the kind of parameter I would

need

for

large messages on a NIC. Coalescing data / large MTU,...


Sure, but there are hard limits imposed by the hardware, especially

with

regards to intranode communications. Once you saturate the memory bus,

you

hit a pretty hard limit.

   George.

Joshua








------ Original Message ------
Received: 02:15 PM CDT, 03/17/2017
From: George Bosilca
To: Open MPI Users

Subject: Re: [OMPI users] tuning sm/vader for large messages

Joshua,

In shared memory the bandwidth depends on many parameters,

including

the

process placement and the size of the different cache levels. In

your

particular case I guess after 128k you are outside the L2 cache

(1/2

of

the

cache in fact) and the bandwidth will drop as the data need to be

flushed

to main memory.

   George.



On Fri, Mar 17, 2017 at 1:47 PM, Joshua Mora

wrote:

Hello,
I am trying to get the max bw for shared memory communications

using

osu_[bw,bibw,mbw_mr] benchmarks.
I am observing a peak at ~64k/128K msg size and then drops

instead

of

sustaining it.
What parameters or linux config do I need to add to default

openmpi

settings
to get this improved ?
I am already using vader and knem.

See below one way bandwidth with peak at 64k.

# Size      Bandwidth (MB/s)
1                       1.02
2                       2.13
4                       4.03
8                       8.48
16                     11.90
32                     23.29
64                     47.33
128                    88.08
256                   136.77
512                   245.06
1024                  263.79
2048                  405.49
4096                 1040.46
8192                 1964.81
16384                2983.71
32768                5705.11
65536                7181.11
131072               6490.55
262144               4449.59
524288               4898.14
1048576              5324.45
2097152              5539.79
4194304              5669.76

Thanks,
Joshua












_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] tuning sm/vader for large messages

Reply via email to