I did try to modify rsocket.c/rstream.c in order to pass to
ib_post_send addresses
aligned with the page size but I'm not succeeding. Indeed it's not
enough to memalign
the buffers allocated to page_size but as far as I understood each
buffer is sent
in chunks this means that those chunks have to
I was able to make ib_write_bw to suffer of the same NUMA issue,
and this is a good news.
The buffer allocation is aligned to 4K, basically the memory allocation
is done in this way:
page_size = sysconf(_SC_PAGESIZE); //this is 4K on my system
buf = memalign(page_size, buffer_size);
with same mod
On Wed, Aug 29, 2012 at 12:51 AM, Hefty, Sean wrote:
>> I'm not sure if I have to say sorry for the noise or not but it seems
>> that the issue was just an NUMA issue!
>
> that's good news
>
>> May be rsocket has to do a memory affinity on the node with the IB
>> board attached on
>> before to all
I'm not sure if I have to say sorry for the noise or not but it seems
that the issue was just an NUMA issue!
My system is a 2 node NUMA system and the IB board is attached on NODE 0.
Not performing any cpu/mem affinity it seems the code runs on the
worst node, always!
Without affinity ( I did run
> $ ./examples/rstream -s 10.30.3.2 -S all
> name bytes xfers iters total time Gb/secusec/xfer
> 16k_lat 16k 1 10k 312m0.52s 5.06 25.93
> 24k_lat 24k 1 10k 468m0.82s 4.79 41.08
> 32k_lat 32k 1 1
I did some experiments with rstream and I saw that with a custom test, the setup
optimized for bw is not performed, that is the following:
val = 0;
rs_setsockopt(rs, SOL_RDMA, RDMA_INLINE, &val, sizeof val);
I did force that setup to be performed even in custom test doing a:
optimization =
> post a message receive
> rdma connection
> wait for rdma connection event
> <>
> start:
>register memory containing bytes to transfer
>wait remote memory region addr/key ( I wait for a ibv_wc)
>send data with ibv_post_send (IBV_WR_RDMA_WRITE)
>post a message receive
>wait for