Re: [vpp-dev] VPP buffer pool allocation optimization guidance

2022-08-02 Thread Benoit Ganne (bganne) via lists.fd.io
> I am wondering if VPP could allow
> VLIB_BUFFER_POOL_PER_THREAD_CACHE_SZ 512 to be changed to a build time
> value as done in DPDK ?

It is a define, you can change it - but I do not think it has been tested . 
Maybe we can make it configurable through cmake also.

> Since in this specific use case 1 rx-only core, and the rest tx-only cores
> and all rx-packets are hand-offed to tx-cores, could there be any negative
> side affect of increasing CACHE_SZ to 8K? Is there any specific reason why
> the CACHE_SZ is hard-coded to 512?  Really appreciate your advice in this
> regard.

I think the main drawback is that you need more memory and it will lead to more 
cache misses, and 512 was deemed as a good value for the general case.

ben

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21759): https://lists.fd.io/g/vpp-dev/message/21759
Mute This Topic: https://lists.fd.io/mt/92656278/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP buffer pool allocation optimization guidance

2022-08-01 Thread PRANAB DAS
Hi Ben,

I am wondering if VPP could allow

VLIB_BUFFER_POOL_PER_THREAD_CACHE_SZ 512 to be changed to a build time value as 
done in DPDK ?  The i40e NIC has 4K max number of tx/rx descriptors. If we 
change the value of VLIB_BUFFER_POOL_PER_THREAD_CACHE_SZ from 512 to 8K - the 
number of buffers is increased by x16 from 0.5x2KB=1KB to total 8Kx2KB (buffer 
size) = 16MB. And with 10 cores the extra overhead is 16x10=160MB which I 
believe not a large memory overhead.

Since in this specific use case 1 rx-only core, and the rest tx-only cores and 
all rx-packets are hand-offed to tx-cores, could there be any negative side 
affect of increasing CACHE_SZ to 8K? Is there any specific reason why the 
CACHE_SZ is hard-coded to 512?  Really appreciate your advice in this regard.

Thank you

- Pranab K Das

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21753): https://lists.fd.io/g/vpp-dev/message/21753
Mute This Topic: https://lists.fd.io/mt/92656278/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP buffer pool allocation optimization guidance

2022-07-29 Thread Benoit Ganne (bganne) via lists.fd.io
> Thank you very much for your message. Is there anyway figure out the
> buffer allocation cross cpu spin-lock contention?

I'd expect it to appear during profiling if it becomes an issue.

> Are there any test results that you could point us to?

Not that I know of.

ben

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21741): https://lists.fd.io/g/vpp-dev/message/21741
Mute This Topic: https://lists.fd.io/mt/92656278/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP buffer pool allocation optimization guidance

2022-07-28 Thread PRANAB DAS
Hi Ben,

Thank you very much for your message. Is there anyway figure out the buffer 
allocation cross cpu spin-lock contention? Are there any test results that you 
could point us to?

Thank you!

- Pranab K Das

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21739): https://lists.fd.io/g/vpp-dev/message/21739
Mute This Topic: https://lists.fd.io/mt/92656278/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP buffer pool allocation optimization guidance

2022-07-28 Thread Benoit Ganne (bganne) via lists.fd.io
I agree that having all allocations on 1 thread (rx) and all de-allocations on 
other threads (tx) is not ideal.
However, if you can reserve enough memory, I suspect it might be easier to just 
*increase* the per-thread buffer cache.
The per-thread buffer cache basically batches spinlock operations. To decrease 
the spinlock contention, you want to go back to the global pool unfrequently. 
If your cache is big enough to accommodate several times your rxq and txq 
sizes, that means you're going to take the spinlock only every other rx/tx node 
calls.
Of course, the downside is you're going to waste memory and loose cpu cache 
locality, so it's a trade-off. You'll need to experiment and measure.

ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of PRANAB DAS
> Sent: Wednesday, July 27, 2022 21:07
> To: vpp-dev@lists.fd.io
> Subject: [vpp-dev] VPP buffer pool allocation optimization guidance
> 
> Hello all, (Dave, Benoit, Damjan and all others)
> 
> 
> We have a VPP application that has a single RX worker/thread that receives
> all packets from a NIC and N-1 packet processing thread that are transmit
> only. Basically on the NIC we have 1 rx queue and N-1 transmit queues. The
> rx-packet/buffer is hand-offed from the rx-thread to a set of cores
> (service-chaining, pipe-lining) and each packet processing core transmits
> on its transmit queue. Some of the packet processing threads might queue
> the packets for seconds, minutes.
> 
> I read in VPP buffer management,  a buffer has three states - available,
> cached (worker thread), used. There is a single global buffer pool and per
> worker cache pool.
> 
> Since buffer after packet tx is completed needed to be returned to the
> pool, in this specific scenario (1 rx and N-1 tx threads) we would like
> buffers to be returned to rx thread so that there is always rx buffers to
> receive packets and we don't encounter rx-miss from the NIC.
> 
> There is a spinlock that is used alloc/free a buffer from the global pool.
> In this case, since there is rx on N-1 threads these are tx only,
> returning buffers to local cache does not benefit performance. We would
> like the buffers to be returned to the global pool and in fact to the
> buffer cache of the single rx-thread directly. I am concerned that as the
> number of the tx threads grows, more buffers will be returned to the
> global pool which requires the spin-lock to free the buffers. The single
> rx-thread will run out of cache-buffer and will attempt to allocate from
> the global pool and thus increasing the chances of spin-lock contention
> overall which could potentially hurt performance.
> 
> Do you agree with my characterization of the problem? Or do you think the
> problem is not severe?
> 
> Do you have any suggestion how we could optimize buffer allocation in this
> case. There are two goals
> 
> * rx-thread never runs out of rx-buffers
> * buffer in pool/caches are not left unused
> * spinlock contention in allocating/freeing buffer from global pool
> is almost 0
> * should scale as we increase number of transmit treads/cores e.g 8,
> 10, 12, 16, 20, 24
> 
> One obvious solution I was thinking of is to reduce the size of local
> buffer cache in transmit worker threads and increase the local buffer
> cache of the single rx thread. Does VPP support an application to set per
> worker/thread buffer caches size?
> 
> The application threads (tx-threads) that queue packets are required to
> enforce a max or threshold queue depth. And if the threshold exceeds, the
> application flushes out the queued packets.
> 
> Is there any other technique we can use, e.g. after transmitting, let the
> NIC move the buffers directly back to the rx for instance.
> 
> I really appreciate your guidance on optimizing buffer usage and reducing
> spinlock contention on tx and rx across cores.
> 
> Thank you,
> 
> - Pranab K Das
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21738): https://lists.fd.io/g/vpp-dev/message/21738
Mute This Topic: https://lists.fd.io/mt/92656278/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP buffer pool allocation optimization guidance

2022-07-27 Thread PRANAB DAS
Hi all,

I am referring to vlib_buffer_pool_put ( 
https://vpp.flirble.org/stable-2110/d8/d9c/buffer__funcs_8h.html#ad7c38a82bb64d64e1f1713a42f344bab
 ) which grabs a spinlock to return the buffers to the rx-thread from the 
tx-threads.

Thank you

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21737): https://lists.fd.io/g/vpp-dev/message/21737
Mute This Topic: https://lists.fd.io/mt/92656278/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



[vpp-dev] VPP buffer pool allocation optimization guidance

2022-07-27 Thread PRANAB DAS
Hello all, (Dave, Benoit, Damjan and all others)

We have a VPP application that has a single RX worker/thread that receives
all packets from a NIC and N-1 packet processing thread that are transmit
only. Basically on the NIC we have 1 rx queue and N-1 transmit queues. The
rx-packet/buffer is hand-offed from the rx-thread to a set of cores
(service-chaining, pipe-lining) and each packet processing core transmits
on its transmit queue. Some of the packet processing threads might queue
the packets for seconds, minutes.

I read in VPP buffer management,  a buffer has three states - available,
cached (worker thread), used. There is a single global buffer pool and per
worker cache pool.

Since buffer after packet tx is completed needed to be returned to the
pool, in this specific scenario (1 rx and N-1 tx threads) we would like
buffers to be returned to rx thread so that there is always rx buffers to
receive packets and we don't encounter rx-miss from the NIC.

There is a spinlock that is used alloc/free a buffer from the global pool.
In this case, since there is rx on N-1 threads these are tx only, returning
buffers to local cache does not benefit performance. We would like the
buffers to be returned to the global pool and in fact to the buffer cache
of the single rx-thread directly. I am concerned that as the number of the
tx threads grows, more buffers will be returned to the global pool which
requires the spin-lock to free the buffers. The single rx-thread will run
out of cache-buffer and will attempt to allocate from the global pool and
thus increasing the chances of spin-lock contention overall which could
potentially hurt performance.

Do you agree with my characterization of the problem? Or do you think the
problem is not severe?

Do you have any suggestion how we could optimize buffer allocation in this
case. There are two goals

   - rx-thread never runs out of rx-buffers
   - buffer in pool/caches are not left unused
   - spinlock contention in allocating/freeing buffer from global pool is
   almost 0
   - should scale as we increase number of transmit treads/cores e.g 8, 10,
   12, 16, 20, 24

One obvious solution I was thinking of is to reduce the size of local
buffer cache in transmit worker threads and increase the local buffer cache
of the single rx thread. Does VPP support an application to set per
worker/thread buffer caches size?

The application threads (tx-threads) that queue packets are required to
enforce a max or threshold queue depth. And if the threshold exceeds, the
application flushes out the queued packets.

Is there any other technique we can use, e.g. after transmitting, let the
NIC move the buffers directly back to the rx for instance.

I really appreciate your guidance on optimizing buffer usage and reducing
spinlock contention on tx and rx across cores.

Thank you,

- Pranab K Das

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#21736): https://lists.fd.io/g/vpp-dev/message/21736
Mute This Topic: https://lists.fd.io/mt/92656278/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/leave/1480452/21656/631435203/xyzzy 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-